View DSP56600_2066402.PDF datasheet online --- IC-ON-LINE

Datasheet File OCR Text:

implementing viterbi decoders using the vsl instruction on dsp families dsp56300 and DSP56600 by dana taipale this application report describes how to generate, from a set of convolutional code polynomials, the assembly code needed for implementation of a viterbi decoder. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . . freescale semiconductor ? freescale semiconductor, inc., 2004. all rights reserved.
order this document by number apr40/d (revision 0, may 1998) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
viterbi decoder implementation iii table of contents introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1.1 introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 1.2 viterbi algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 1.3 manual organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4 the viterbi algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 2.1 is-136 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 2.2 convolutional encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 2.3 viterbi decoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 2.4 algorithmic enhancements. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12 expanding the viterbi algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 3.1 introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 3.2 partitioning the task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 3.3 the inner loop: viterbi butterflies . . . . . . . . . . . . . . . . . . . . . . . 3-3 3.4 creating the branch metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 3.5 storing the paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 3.6 traceback: obtaining the decoder output . . . . . . . . . . . . . . . . 3-13 3.7 main: gluing the pieces together . . . . . . . . . . . . . . . . . . . . . . 3-17 3.8 memory organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19 algorithmic extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 4.1 introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 4.2 allowing more general branch metrics . . . . . . . . . . . . . . . . . . . 4-3 4.2.1 modify viterbi butterfly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4 4.2.2 modify branch metric generation . . . . . . . . . . . . . . . . . . . . . 4-6 4.3 starting from 0: the pre acs macro . . . . . . . . . . . . . . . . . . . . . 4-8 4.4 collapsing the states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10 4.5 main: putting the pieces back together . . . . . . . . . . . . . . . . . 4-12 summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 5.1 summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
iv viterbi decoder implementation 5.2 conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-3 5.3 program listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-4 appendix a basic algorithm program listing. . . . . . . . . . . . . . . a-1 a.1 viterbi algorithm program listing . . . . . . . . . . . . . . . . . . . a-3 appendix b extended algorithm program listing . . . . . . . . . . . b-1 b.1 16-bit enhanced viterbi decoder program listing . . . . . . b-3 appendix c 24-bit algorithm program listing . . . . . . . . . . . . . . c-1 c.1 24-bit enhanced viterbi decoder program listing . . . . . . c-3 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
viterbi decoder implementation v list of figures figure 2-1 example rate 1/2 convolutional encoder. . . . . . . . . . . . . . . . . . . . . 2-3 figure 2-2 single state of prototype encoder tree diagram . . . . . . . . . . . . . . . 2-5 figure 2-3 first five levels of the encoder state tree . . . . . . . . . . . . . . . . . . . 2-6 figure 2-4 using multiple input paths to collapse the state tree . . . . . . . . . . . 2-8 figure 2-5 using intermediate states to eliminate partial paths . . . . . . . . . . . . 2-9 figure 2-6 trellis structure for viterbi decoding. . . . . . . . . . . . . . . . . . . . . . . . 2-10 figure 2-7 example viterbi butterfly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11 figure 2-8 encoder tap symmetry to reduce calculations. . . . . . . . . . . . . . . 2-13 figure 2-9 applying an offset to obtain odd symmetry in the branch metrics 2-14 figure 3-1 stored b0 path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 figure 4-1 polynomial (1,1+d) trellis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
vi viterbi decoder implementation f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
viterbi decoder implementation vii list of tables table 2-1 input/output mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 table 3-1 recreated encoder outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8 table 5-1 viterbi decoder code statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
viii viterbi decoder implementation f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
viterbi decoder implementation ix list of examples example 3-1 using the vsl instruction in a viterbi butterfly . . . . . . . . . . . . . . . . . 3-4 example 3-2 find branch metrics code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9 example 3-3 partial path storage code listing . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 example 3-4 traceback output path code listing . . . . . . . . . . . . . . . . . . . . . . . 3-15 example 3-5 main viterbi decoding routine: initialization . . . . . . . . . . . . . . . . . . 3-17 example 3-6 main viterbi decoding routine: patch metric update . . . . . . . . . . . 3-18 example 3-7 main viterbi decoding routine: termination and traceback . . . . . 3-19 example 3-8 memory organization code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20 example 4-1 modified viterbi butterfly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4 example 4-2 modified branch metric generation code . . . . . . . . . . . . . . . . . . . . 4-6 example 4-3 pre-acs macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9 example 4-4 the acsflush macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11 example 4-5 main program code changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13 example a-1 basic 16-bit implementation of a viterbi decoder . . . . . . . . . . . . . . a-4 example b-1 extended algorithm program listing . . . . . . . . . . . . . . . . . . . . . . . . b-3 example c-1 24-bit algorithm program listing . . . . . . . . . . . . . . . . . . . . . . . . . . . c-3 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
x viterbi decoder implementation f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
section 1 introduction f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
1-2 viterbi decoder implementation introduction 1.1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-3 1.2 viterbi algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-3 1.3 manual organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-4 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
introduction introduction viterbi decoder implementation 1-3 1.1 introduction today?s communication systems typically make considerable use of signal processing to improve their performance. two common functions that use signal processing to improve performance are channel equalization and error correction coding. for equalization, maximum likelihood sequence estimation is among the most popular schemes; for error correction, convolutional coding with viterbi decoding is a method of choice. both communication functions make use of the viterbi algorithm to accomplish their task. to simplify the explanations presented here, the focus is on using the algorithm for error correction. this application report is written with the intent of instructing the reader who, with a set of convolutional code polynomials, can generate the assembly code needed to implement a viterbi decoder on the dsp56300 and DSP56600 families of digital signal processors (dsps). the dsp56300 and DSP56600 families of dsps are designed to implement communication functions. because the viterbi algorithm occupies such a prominent position in many communications systems, these processor families have a special instruction, viterbi shift left (vsl). this instruction is designed to improve the performance of the processor when doing viterbi algorithm operations. one goal of this application note is to explain the use of the vsl instruction when implementing the viterbi algorithm. 1.2 viterbi algorithm communication applications that use convolutional coding or sequence estimation often use the viterbi algorithm to efficiently process the received data. these applications occur so often that the dsp56300 and DSP56600 families have a special instruction to enhance the algorithm?s implementation. this application note begins by introducing the viterbi algorithm, then proceeds step by step through an implementation of that algorithm for a specific error correction example. in addition, use of the new vsl instruction is explained for obtaining efficient coding for applications requiring the viterbi algorithm. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
1-4 viterbi decoder implementation introduction manual organization 1.3 manual organization section 2 of this manual covers many basics of the viterbi algorithm and introduces much of the terminology used in this application note. those already acquainted with the algorithm may wish to proceed directly to section 3 . section 3 shows how to develop assembly language routines for the viterbi algorithm. this is done by means of an example, taking a set of encoding polynomials from the wireless interim standard is-136 and developing the assembly routines needed to implement the corresponding decoder using the viterbi algorithm. section 4 discusses some code modifications that are useful for generalizing and optimizing the example code. the code-to-cover generalized branch metrics is extended, and an efficient method for normalizing the path metrics is introduced. in addition, two additional sets of code are given for more efficient processing of data periodically forced through a known state. section 5 summarizes the application report and presents interesting data and statistics arrived at by comparing the basic viterbi algorithm code of section 2 with that of the modified program code presented in section 3 and section 4 . appendix a contains the complete program listing for section 3 . appendix b contains the complete program listing for section 4 . appendix c contains the complete 24-bit program listing for section 4 . f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
section 2 the viterbi algorithm f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
2-2 viterbi decoder implementation the viterbi algorithm 2.1 is-136 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-3 2.2 convolutional encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-3 2.3 viterbi decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-4 2.4 algorithmic enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-12 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
the viterbi algorithm is-136 viterbi decoder implementation 2-3 2.1 is-136 this section introduces the viterbi algorithm through an example from industry: the convolutional error correcting code used in is-136, the tia/eia interim wireless standard. here it is shown how the algorithm is derived, and some typical computational enhancements are introduced. 2.2 convolutional encoding the viterbi algorithm can be usefully illustrated by implementing a convolutional code. a convolutional encoder implementation in hardware is illustrated schematically in figure 2-1 . in order to illustrate the steps needed to implement viterbi decoders in general, this application report develops the assembly code needed to implement a viterbi algorithm decoder for this encoder. figure 2-1 example rate 1/2 convolutional encoder in this figure, the square boxes represent a one-bit-wide shift register. some shift register outputs are connected to modulo 2 adders (i.e., parity generators). every time one bit is shifted into the input, two output bits are generated by the adders. because we have more output bits than input bits, we have redundancy. the redundant information can be used to correct errors at the receiver. s 1 s 0 s 2 s 3 s 4 1 d d 2 d 3 d 4 d 5 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
2-4 viterbi decoder implementation the viterbi algorithm viterbi decoder error correcting codes have some standard notation. this example, used in the wireless standard is-136, is a rate 1/2 code, meaning that for each processing period, one information bit is taken in and two bits of output are generated. the word (s 4 ,s 3 ,s 2 ,s 1 ,s 0 ) is the encoder state. the remaining bit of the encoder we call the input bit. there is a convenient notation that describes the encoder using polynomials. this encoder can be described with the encoding polynomials (1+d+d 3 +d 5 ,1+d 2 +d 3 +d 4 +d 5 ). each factor d corresponds to a one clock delay for that adder input. given the encoding polynomials, we can begin to design the viterbi decoder. 2.3 viterbi decoder to design a viterbi decoder, begin by taking some example encoder input and generating the corresponding output. an example appears in table 2-1 , where it is assumed that the encoder is 0 filled at the start. for proper decoding, we need to recreate the encoder states and find the set of state changes, or transitions, that produce an encoder output that best agrees with our decoder input data. we begin using the same assumption we used in generating the encoder output in table 2-1 . we assume that the initial state of the encoder is 00000. table 2-1 input/output mapping to compare our recreated encoder state with the decoder input, we keep track of the agreements between the recreated encoder outputs and the decoder inputs. the cumulative agreement for a path leading to one particular recreated encoder state is called a path metric for that path. incremental agreements are called branch metrics. encoder input encoder output 111 010 110 110 010 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
the viterbi algorithm viterbi decoder viterbi decoder implementation 2-5 figure 2-2 shows a prototype for one state of the state tree for this decoder. figure 2-3 shows the entire tree used by the decoder to track the first five input pairs. the state of the recreated encoder appears in the box. along each arrow (a state transition) appears a number pair showing the corresponding encoder output bit pair for that state transition (the output produced immediately before the state changes). this figure is ordered so that transitions corresponding to an input 0 are always the upper path and transitions corresponding to an input 1 are always the lower path. to determine the encoder output for a given transition, load the encoder with the state and use a 0 or 1 for the input bit, according to the transition chosen. the encoder polynomials then determine the encoder output, as shown in figure 2-1 . figure 2-2 single state of prototype encoder tree diagram state transition encoder output / branch metric path metric state f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
2-6 viterbi decoder implementation the viterbi algorithm viterbi decoder figure 2-3 first five levels of the encoder state tree 11 10 10 input: 00/0 00000 00001 00010 00000 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 00010 00011 00100 00110 01000 01001 01010 01011 01100 01101 01110 01111 00000 00001 10 10 00/1 00/1 00/1 00/1 11/2 10/2 01/0 11/1 11/1 10/2 01/0 01/0 10/2 11/1 00/1 11/1 10/2 01/0 01/0 10/2 11/1 00/1 11/1 01/0 10/2 10/2 01/0 00/1 11/1 11/1 10/2 01/0 01/0 10/2 11/1 00/1 11/1 00/1 01/0 10/2 10/2 01/0 00/1 11/1 01/0 10/2 11/1 00/1 00/1 11/1 10/2 01/0 10/0 01/2 00/1 11/1 11/1 00/1 01/0 10/2 00 11 10 01 01 10 11 00 11 00 01 10 10 01 00 11 01 10 11 00 00 11 10 01 10 01 00 11 11 00 01 10 11 00 01 10 10 01 00 11 00 11 10 01 01 10 11 00 10 01 00 11 11 00 01 10 01 10 11 00 00 11 10 01 0 0 2 1 1 4 2 2 2 3 1 4 6 3 3 3 3 4 2 3 5 2 2 5 5 6 8 5 3 4 4 00/1 00011 00010 00111 00110 00101 00100 00001 00000 00011 00111 00000 00000 00010 00001 00101 00001 aa1542 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
the viterbi algorithm viterbi decoder viterbi decoder implementation 2-7 comparing the recreated encoder output with the decoder input, we determine the number of agreements, shown in figure 2-3 as the branch metric. we find a branch metric for each transition. for each state, we track the cumulative branch metrics to form the path metrics. these are shown as a number appearing above each state box. to recreate the correct input sequence, we choose the recreated encoder path that best agrees with the decoder input data. in this case, the best path is the one with the largest final path metric. for clarity, the path metrics for the best path are distinguished with a larger font size and bolder arrows. to recreate the input sequence (so far), we can use two methods. the easiest is to use the state as the decoder output. unfortunately, this method will not work after we finish the development of the decoder. the second method will work when we are done. to obtain the decoder output, trace the best path (the one with the largest final path metric) back to the beginning. now, follow the same path forward again to obtain the input sequence by placing a 0 at the decoder output each time we choose an upper transition, and a 1 output each time we choose the lower transition. using this method on the tree in figure 2-3 gives us 10110, which agrees with the encoder input example. the most troublesome aspect of this decoder is that the number of states we have to track for each decoder input is actually the number of possible paths. for this coding example, the number of states doubles for each input. for any reasonable number of inputs, the amount of work and storage needed for this decoder is far too large to be practical. to solve this problem, begin by noting that we don?t really need all the data generated by the decoder. in particular, all the work goes toward finding the path that best agrees with the input. we only need the path that gives us the largest path metric. if we determine that a path cannot ever have the largest path metric, we can ignore that path for all future calculations. to collapse the ever-growing tree in figure 2-3 , consider what happens if we continue the tree for one more pair of decoder inputs. the total number of states would be 64 for the next input pair. to keep the diagram manageable, only a pair of specially chosen states appears in figure 2-4 . when we extend the tree to the next state, we get states with six bits. note, however, that the encoder we are attempting to trace only needs five bits to determine its output bits. to emphasize this, the sixth (leading) bit is separated in the state boxes. because the extra (leading) bits do not affect the recreated encoder outputs, we can ignore them. as a result, the 64 states collapse into 32 states again. the only resulting complication is that each state now has two rather than one entering paths. figure 2-4 shows this state collapse as well as each state?s multiple input paths in two example states. to correctly process the decoder input, we must next determine which of the input paths to keep for each state. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
2-8 viterbi decoder implementation the viterbi algorithm viterbi decoder figure 2-4 using multiple input paths to collapse the state tree to find out how to choose the correct input branch for each state, we use the additivity of the path metrics. suppose we want to find the path with the largest metric that extends from time i to time k (ipm2. then any path containing p2 cannot be the path with largest path metric for any time after time j. 00000 0 00001 0 00000 10000 1 00001 1 00000 00 11 11 00 00000 10000 00000 00001 00 11 11 00 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
the viterbi algorithm viterbi decoder viterbi decoder implementation 2-9 to understand why this is so, note that the total path metric for is pm1+qm1. similarly, has path metric pm1+qm2, has path metric pm2+qm1, and has path metric pm2+qm2. suppose is a candidate for largest path metric. then is still larger, as pm1+qm2 >pm2+qm2 (remembering we assumed that pm1 >pm2). this is true for any q2. so, if pm1 > pm2, we can eliminate p2 from further consideration at time j, without waiting for the future inputs from time j to time k . figure 2-5 using intermediate states to eliminate partial paths in our example, each state will have two paths entering it at any time. the number of paths doubles each time, but we can now eliminate half of the paths each time as well. this is the basis of the viterbi algorithm. to use it, we must keep track of the best path metric for each state up to the current time. by eliminating paths that can never have the largest path metric, however, we keep the amount of computation constant with time. to diagram what is going on, we need a different figure than the tree in figure 2-3 . we need only keep track of each (encoder) state at each time. the result is called a trellis because it resembles one. a diagram of a trellis for our is-136 code appears in figure 2-6 . the beginning of the trellis does not look the same for each time (each decoder input is a stage in the diagram). this is because we assumed the encoder started in state 00000. in general, the encoder can be in any state, and the trellis repeats. for our figure, this can be seen in stages 6 and 7 (after 5 stages, our encoder can be in any state, so we get a steady state in our trellis diagram). each line in the trellis is a transition, just like the ones in the tree. for the trellis, however, the number of states is bounded. the number of states for this trellis is determined by the encoder which has five state bits. this means there are 2 5 = 32 states in the trellis. p 1 q 1 p 1 q 2 p 2 q 1 p 2 q 2 p 2 q 2 p 1 q 2 state s p1 p2 q1 q2 time i time j time k f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
2-10 viterbi decoder implementation the viterbi algorithm viterbi decoder figure 2-6 trellis structure for viterbi decoding 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 stage 12 34 5 6 7 state aa1543 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
the viterbi algorithm viterbi decoder viterbi decoder implementation 2-11 fortunately, we do not have to examine all of this trellis at once! for an idea of the basic processing, we can examine states in pairs. in particular, note that we only need path metrics of two source states to update a pair of destination states. note that the source state pairs are not the same as the destination state pairs. for our example code, the source state pair 0abcd and 1abcd provides the path metrics needed to update the destination state pair abcd0 and abcd1, where abcd is fixed for each state pair update. let?s examine a single pair update. this is sometimes called a viterbi butterfly. an example butterfly is shown in figure 2-7 . . figure 2-7 example viterbi butterfly to compare the input paths to the state being updated, we must take the path metric for each state on connecting input branches and add the branch metric associated with that path. in the example shown in figure 2-7 , the lower branch from state 00100 has a recreated encoder output of 00. as noted before, this can be determined by finding the encoder output for a state of 00100, along with an input bit of 1 because we are taking the lower branch. assuming a decoder input of 11, we find that neither bit of the recreated encoder output agrees with the decoder input. we assign a branch metric of 0 (no agreements). we can do similar assignments for the other branches, as shown in figure 2-7 . to update the states, we add the path metrics to the branch metrics. then we compare the sums and choose the larger as the surviving branch. the sum of the surviving path becomes the path metric for the updated state. we must do this update for all state pairs, for each decoder input. when implementing a viterbi decoder in software, this update is the most computation-intensive and creates a do loop that should be optimized to obtain good performance. in the next section, we will develop dsp56300 assembly code to implement this algorithm. 00100 10100 01000 01001 5 6 11 00/0 11/2 11/2 00/0 00100 10100 01000 01001 5 6 11 11/2 11/2 7 8 state path metric encoder out/ branch metric decoder input f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
2-12 viterbi decoder implementation the viterbi algorithm algorithmic enhancements 2.4 algorithmic enhancements there are a number of modifications that can be made to the algorithm in order to improve its effectiveness as well as our ability to implement it. in our example, we count agreements. this produces branch metrics of 0, 1, and 2. in our example butterfly, however, the branch metrics into the state pair are all 0?s and 2?s. this is not entirely a coincidence. for this code, the encoding polynomials both have taps at both extremes. what this means is that a 1/0 at the end of the encoder will invert/not invert the encoder output. for our trellis, this means that the pair of branches into a given state will always have complementary branch metrics (0 and 2, or 1 and 1). by combining this observation with one more property, we can simplify the viterbi butterfly computation. the additional property for this code (shown in figure 2-8 ) is that a 1/0 at the input bit will invert/not invert the encoder output. for our trellis, this means that the two states of a state pair being updated will have the same branch metrics, but with their order reversed (0 and 2 will become 2 and 0, etc.). these polynomial conditions are not true in general but do occur quite often. to take advantage of them, we subtract 1 from every branch metric we produce. this gives branch metric pairs of (-1,1) and (0,0). as shown in figure 2-9 , each pair now has odd symmetry in the two components. hence, we need not compute branch metrics for both branches in an update. instead, we can find just the uppermost branch metric, and add it as usual for the upper branch. next, subtract rather than add it to do the lower branch processing. to process the lower state update, we subtract the branch metric for the upper path and add to update the lower path. the total path metrics change, but their relative values do not. since we are only looking for the path with the maximum path metric, the results are the same. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
the viterbi algorithm algorithmic enhancements viterbi decoder implementation 2-13 figure 2-8 encoder tap symmetry to reduce calculations 00100 10100 01000 01001 11 00/0 11/2 decoder input s 1 s 0 s 2 s 3 s 4 1 d d 2 d 3 d 4 d 5 00100 10100 01000 01001 s 1 s 0 s 2 s 3 s 4 1 d d 2 d 3 d 4 d 5 11/2 00/0 if all polys have taps at the input , branches into the same state will have complement branch metrics. if all polys have taps at the last (s 4 ) state bit, branches into the lower state pair will be flipped versions of the upper state branches. 1 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
2-14 viterbi decoder implementation the viterbi algorithm algorithmic enhancements figure 2-9 applying an offset to obtain odd symmetry in the branch metrics a second modification to the algorithm is the ability to use soft decisions. by this we mean that the input data need not be just a 0/1 decision on a receiver statistic. we can assign intermediate values to indicate the reliability of the decision. in this way, a 0.001 can be a very reliable 0, a .75 can be a moderately reliable one, and a .49 is just barely a zero. the viterbi algorithm generalizes to handle soft decisions with no trouble; the additions for the metric updates simply have more bits. by using soft decisions, the ability of the decoder to correct errors is greatly improved. note that from an implementation on our dsp view, this modification is almost invisible, as 16-bit or 24-bit arithmetic will be done regardless of the number of bits in the branch metrics. of course, provisions to protect path metric values from overflow may have to change for some systems. 00100 10100 01000 01001 11 00/-1 11/1 decoder input s 1 s 0 s 2 s 3 s 4 1 d d 2 d 3 d 4 d 5 00100 10100 01000 01001 s 1 s 0 s 2 s 3 s 4 1 d d 2 d 3 d 4 d 5 11/1 00/-1 after subtracting one from each branch metric, the updates can be be done by adding and subtracting 1. similarly, the lower state update can be done by subtracting 1 first, then adding 1 for the lower branch. same metric, but the add/sub order is reversed. 1 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
section 3 expanding the viterbi algorithm f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3-2 viterbi decoder implementation expanding the viterbi algorithm 3.1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-3 3.2 partitioning the task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-3 3.3 the inner loop: viterbi butterflies . . . . . . . . . . . . . . . . . . . . . . . .3-3 3.4 creating the branch metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6 3.5 storing the paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-11 3.6 traceback: obtaining the decoder output . . . . . . . . . . . . . . . .3-13 3.7 main: gluing the pieces together . . . . . . . . . . . . . . . . . . . . . . .3-17 3.8 memory organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-19 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
expanding the viterbi algorithm introduction viterbi decoder implementation 3-3 3.1 introduction having introduced the principal concepts involved in the viterbi algorithm, we can proceed to generate code to implement it. there are, however, a number of details that need attention. we have concentrated on the viterbi butterfly.the next step involves, among other things, generating the branch metrics and allowing for multiple word traceback to recover the output data. up to now, we have not even discussed how the path decision might be stored. the example code in this section shows a basic implementation of the viterbi algorithm implemented on the DSP56600 assembly language. these examples decode data encoded using the convolutional code for is-136. 3.2 partitioning the task in this section, we develop code to implement the entire algorithm. we will start with the most critical section, the viterbi butterfly code. we then work outward, developing branch metric storage code and traceback instructions. all code assumes 16-bit registers and will run as-is on the DSP56600 dsp family. it also runs correctly on the dsp56300 family of dsps when they are set in 16-bit arithmetic mode. a complete code listing of the modules presented here appears in appendix a . the code can be easily modified for 24-bit operation by changing one line of code, as shown in the detailed traceback code discussion. note that the test data provided is 16-bit data, and would have to be changed to 24-bit data to test correctly in 24-bit arithmetic mode. also, appendix c has a 24-bit version of the code discussed in section 4 . 3.3 the inner loop: viterbi butterflies we begin by discussing the code used to do a viterbi butterfly (updating the states). the easiest way to present the code for this section is to display the instructions, then discuss their function. the code utilizes the viterbi shift left (vsl) instruction, designed for dsp56300 and DSP56600 dsp families to enhance their ability to execute the viterbi algorithm. the code appears in example 3-1 . f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3-4 viterbi decoder implementation expanding the viterbi algorithm the inner loop: viterbi butterflies example 3-1 using the vsl instruction in a viterbi butterfly ;*******************viterbi add, compare, select butterfly macro*** ; function: update path metrics/paths for the viterbi algorithm by ; doing an add, compare, select update for state pairs. ; inputs: ; r2 should point to the beginning of the branch metric table ; r5 should point to the latest path metric for state 0 ; r4 should point to the storage location for updated state 0 ; n5 should offset addresses by numstates/2 ; outputs: ; updated path metrics/paths stored in xy memory ; registers used: ; a,b,x0,y1,r2,r4,r5,n5 r2 unchanged (modulo required) ; registers: ; r5, pointer to the path metric/path table, arranged as ; x: path metric, y: path, states ordered assuming ; bits shift right to left. ; r4, pointer to the output path metric/path table ; r0, pointer to the branch metric table, arranged ; as x:c, y:d, cd,cd,cd, etc. ; ; sa------nsa ; \ c / ; d \ / ; \/ ; /\ ; d / \ ; / \ ; sb------nsb ; c ;***************************************************************** ; acs macro ; ; move #bry,r2 ;r2 points to branch metrics move y:(r2)+,y1 ;get first branch metric move l:(r5)+n5,a ;load 1st metric/path pair ; do #noofacsbutt,_p_nextstage ;update each state sub y1,a l:(r5)-n5,b ;sub pt,br met,get next pair add y1,b ;update metrics max a,b l:(r5)+n5,a ;pick max,reload 1st pair vsl b,0,l:(r4)+ ;store survivor, end top half add y1,a l:(r5)-n5,b ;add pt,br met, reload next pair sub y1,b x:(r5)+,x0 y:(r2)+,y1 ;inc st ptr,ld nxt br met max a,b l:(r5)+n5,a ;pick max met,load next pair vsl b,1,l:(r4)+ ;st survivor,end 2nd half _p_nextstage nop ;needed to separate do loop ends endm f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
expanding the viterbi algorithm the inner loop: viterbi butterflies viterbi decoder implementation 3-5 we now explain the routine line by line. begin by moving the address of the precomputed branch metric table to the r2 address register. in our final, combined routine, this is unnecessary because the desired value already exists in r2, but we include it as a comment here to make it clear that this value is needed. branch metric creation will be discussed in section 4.2 . next, we update the address register that points to the states used to update. the memory that stores the path metrics is divided into two parts. in this routine, r5 points to the old path metrics, used to update the new ones. address register r4 points to the new states, those being updated. for each new decoder input, we swap the memory used to store the updated path metrics with the memory pointing to the old path metrics. to do this, we initialize address registers r4 and r5 to be modulo registers, with a modulus of twice the number of states. address incrementing for the butterfly is so designed that at the end of the loop, r4 and r5 have automatically swapped pointer values. thus, no instructions need be dedicated to swapping the memory. the path metrics are stored in x memory. collocated with them in y memory are the paths. by paths, we mean a word that contains the bit decisions describing the recreated encoder input data that would produce that path in the decoder. how we get these will become clear below as we go through this code. by collocating the path metrics with their respective paths, we can get both by doing long memory moves. the next two moves load the first branch metric, and the first (state 00000) path metric/path pair. now we are ready for the loop. it is necessary to update every state, and we update states in pairs. hence, the number of loop passes is equal to the number of states divided by 2. in our example, this is the value of noofacsbutt (i.e., 16). because we have polynomials with taps at both ends of the encoder, we can use one branch metric, and add and subtract to update our states. we begin by subtracting the branch, loading the second path metric/path pair at the same time. next, we add the branch metric to the path metric we just fetched. note that for both the subtract and add of the metric, the path metrics in a1 and b1 are affected, but because we are not performing a long word add, the paths in a0 and b0 are not. to compute an updated metric, choose the largest result of the subtract/add operations. the max instruction puts the updated result in b. note that all of a is transferred if a1 is the survivor path, which means that b0 holds the path bits that represent the survivor path. this instruction also reloads the first path metric/path pair for use in updating the lower state later in the loop. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3-6 viterbi decoder implementation expanding the viterbi algorithm creating the branch metrics the next instruction is the vsl. vsl is a mnemonic for viterbi shift left, a new instruction tailored for viterbi algorithm updates. the action of this instruction is to take an accumulator a or b, store the mid-register (a1 or b1) in x memory, shift the accumulator left, append a 0 or 1, as indicated by the instruction arguments, and store the low register (a0 or b0) in y memory. the net result is that path bits are updated and the path metric/path pair stored in memory. this vsl puts a 0 in the lsb of b0. in this way we store the input bit of this transition (which is 0 regardless of the path chosen because the input bit is 0 as long as the destination state is the upper state). because we are storing recreated encoder bits, this path string will be a recreation of the encoder input, which is what we want for the decoder output. of course, we can only store 16 or 24 bits of the path at a time (16 for DSP56600 or dsp56300 in 16-bit arithmetic mode, 24 for dsp56300 otherwise). next, repeat the operations for the lower state update. note we subtract and add, rather than add and subtract, as required by the lower state update. the add instruction reloads the second path metric/path pair, while the sub instruction increments the path metric fetch pointer and loads the branch metric from the branch metric table, both for use in the next loop iteration. the increment of the path metric fetch pointer is a dummy read into x0 (i.e., x0 is not used). this allows us to use a second parallel move to increment r5. the max instruction finds the survivor path metric/path pair for the lower state, and preloads the path metric/path pair for the next loop iteration. the last vsl instruction shifts the survivor path left 1 bit, appends a 1 to represent the recreated encoder state for the lower state, and stores the lower state path metric. we now iterate this loop by the number of butterflies needed, and exit the macro. 3.4 creating the branch metrics to present the branch metric routine, we start with the encoding polynomials. for this example, we start with the encoding polynomials. we then show how to create the branch metrics we need (in the order required) to do the butterfly correctly. recall that the encoding polynomials are 1+d+d 3 +d 5 and 1+d 2 +d 3 +d 4 +d 5 . there are 32 states in the decoder, so we need 32 branch metrics. in general, we would access these metrics in pairs in the butterfly routine, updating the states in pairs. as noted above, we can save the work involved in half of these, because our polynomials induce a symmetry in the branch metrics. by this we mean that the branch metrics for this code have the property that the upper and lower input branches to any state are complementary . f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
expanding the viterbi algorithm creating the branch metrics viterbi decoder implementation 3-7 thus, by using a shifted version of the branch metrics, we only need one branch metric for each state pair update. this metric is the one that gets subtracted from one path and added into the other. we need only generate and store 16 branches. any given branch will, for a specified input, have one of four values. the four values will represent how well the decoder input compares to the recreated encoder outputs of 00, 01, 10, 11. for our example, we assume some soft decision input. the decoder input data is shifted and scaled so that 1 is changed to 16, and 0 is changed to -16. we obtain two partial branch metrics b 0 and b 1 , where b 0 is the partial branch metric for the encoder polynomial 1+d+d 3 +d 5 output, and b 1 is the partial branch metric for the 1+d 2 +d 3 +d 4 +d 5 polynomial output. we can then compute the four branches as b 0 +b 1 , b 0 -b 1 , -b 0 +b 1 , and -b 0 -b 1 , corresponding to recreated encoder values of 11, 10, 01, and 00. the butterfly loop needs the branches in a specific order dictated by the encoding polynomials. as seen in figure 2-2 , we update states 2i and 2i+1 using states i and i+16. we only need the 16 branches for the 16 state pairs. the state gives us five of the six values needed to determine the encoder output. the remaining value can be found by examining the butterfly loop assembly code. in the loop, note that we subtract the branch metric first (from the upper branch), then add it to the lower branch. the butterfly loop is written assuming that the single branch metric value has the correct sign to be added in the lower branch. this is equivalent to assuming the encoder input bit is a 1 (for the purposes of computing the branch metric table). we account for this in the code by choosing positive 16 for 1?s and minus 16 for 0?s (as noted in the preceding paragraph). the resulting recreated encoder outputs for the first 16 states are shown in the following table. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3-8 viterbi decoder implementation expanding the viterbi algorithm creating the branch metrics table 3-1 recreated encoder outputs the code needed to implement the values in this table appears in example 3-2 . state input bit encoder output map encoder output to branch metric branch metric for example decoder 10 input 00000 1 11 b 0 +b 1 16 + (-16) = 0 00001 1 01 -b 0 +b 1 (-16) + (-16) = -32 00010 1 10 b 0 -b 1 32 00011 1 00 -b 0 -b 1 0 00100 1 00 -b 0 -b 1 0 00101 1 10 b 0 -b 1 32 00110 1 01 -b 0 +b 1 -32 00111 1 11 b 0 +b 1 0 01000 1 10 b 0 -b 1 32 01001 1 00 -b 0 -b 1 0 01010 1 11 b 0 +b 1 0 01011 1 01 -b 0 +b 1 -32 01100 1 01 -b 0 +b 1 -32 01101 1 11 b 0 +b 1 0 01110 1 00 -b 0 -b 1 0 01111 1 10 b 0 -b 1 32 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
expanding the viterbi algorithm creating the branch metrics viterbi decoder implementation 3-9 example 3-2 find branch metrics code ;*****************branch metric macro************************************* ; function: input data and generate branch metrics. ; for this decoder, the metric is a scaled ; sum or difference of the real and imag inputs. ; inputs: ; r2 should point to the beginning of the branch metric table ; r5 should point to the latest path metric for state 0 ; r1 should point to the next input xy data pair ; outputs: ; branch metrics are stored at brx in xy memory ; registers used: ; a,b,x01,y01,r1,r2,n2, r2 unchanged (modulo req'd) ;**************************************************************** ; findmetrics macro move l:(r1)+,y ;grab dec input move #-16,x1 ;sign for real component, 0 sent. mpy x1,y1,a ;a has 0x partial branch move #bry+3,r2 ;storage for generated branch metrics mac x1,y0,a a,b ;a gets 00 branch mac -x1,y0,b ;b has 01 branch ; neg a a,x1 a,y:(r2)+n2 ;mv 00 to x1,11 to a, st 00 in location 3 neg b b,x0 b,y:(r2)+n2 ;mv 01 to x0,10 to b, st 01 in location 6 ;***************************************************************************** ; at this point x1 has 00, x0 has 01, ; a1 has 11, b1 has 10, needed for quick storage in y memory ;***************************************************************************** move x1,y:(r2)+n2 ;store 00 in location 9 move x0,y:(r2)+n2 ;store 01 in location 12 move b,y:(r2)+n2 ;store 10 in location 15 move b,y:(r2)+n2 ;store 10 in location 2 move b,y:(r2)+n2 ;store 10 in location 5 move b,y:(r2)+n2 ;store 10 in location 8 move x0,y:(r2)+n2 ;store 01 in location 11 move x1,y:(r2)+n2 ;store 00 in location 14 move x0,y:(r2)+n2 ;store 01 in location 1 move x1,y:(r2)+n2 ;store 00 in location 4 move a,y:(r2)+n2 ;store 11 in location 7 move a,y:(r2)+n2 ;store 11 in location 10 move a,y:(r2)+n2 ;store 11 in location 13 move a,y:(r2) ;store 11 in location 0. r2-> bry endm f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3-10 viterbi decoder implementation expanding the viterbi algorithm creating the branch metrics the findmetrics macro is for the most part straightforward. we begin by loading the decoder input data to compute the branch metrics. next, load the scaling factor, and multiply to obtain the partial branch metric with one of the inputs. the next line initializes address register r2 as a pointer to the table location used by state 3, where the branch metrics are to be stored. we then finish the computation of the branch metric for a branch with encoder output 00, followed by the metric for 01. with these two metrics, we begin to load the branch metric table. we can load the branch metric table in any order. to minimize the number of cycles needed, we apply a few constraints. first, it is easier if we load the table in some consistent manner such as using a constant address offset each time. second, it is convenient if we end up at the address used by state 0, because then we are initialized for the butterfly state update. also, we must finish generating the metric values for 11 and 10. it is most efficient if we can do this in parallel with the moves to load the branch metric table. the easiest way to store the branch metrics might be to start at state 15 and count down. if we do this, it turns out that a stall is unavoidable: we cannot generate all of the required branch metric values in time to use them with this ordering. other loading orders are easy to obtain by using an address register increment that is relatively prime to the table length (and using a modulo addressing mode to wrap around). for our table length, any odd number increment will ensure that we cover the entire table with constant increments. our first candidate is 3. if we start at state 3, and increment by 3?s, we will cover the entire table and end up at the address for state 0. these first three branch metric values are associated with recreated encoder outputs of 00, 01 and 00. because the repeat of 00 gives us time to generate the extra branch metric values with no stall, we use increments of 3. we begin by writing the branch metric for state 3. at the same time, we negate a to compute the metric for branches with encoder output 11, saving the metric for 00 in x1. for the next write, we negate b to compute the metric for 10, saving the metric for 01 in x0. we write to the branch metric table at the same time. what follows are writes to the branch metric table. using modulo addressing, we increment the address by 3?s until the entire table is filled. we write the branch metrics determined by the encoder polynomials as they appear in table 3-1 . finally, note that this code has been constructed so that the address register r2 points to the beginning of the branch metric table at the end of this routine. because of this, the butterfly loop is automatically initialized to load branch metrics by the end of this routine. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
expanding the viterbi algorithm storing the paths viterbi decoder implementation 3-11 3.5 storing the paths for some applications of the viterbi algorithm, the paths will not need to be stored. this occurs when the memory of the encoder (or channel, for maximum likelihood sequence estimation) is small enough for the paths in the algorithm to merge quickly. in other situations, this may not be adequate. to clarify the storage process, we introduce the idea of traceback. assume the decoder has been processing data for some time. if we examine the paths up to the time of the current input, it is not clear what path should be chosen. in fact, choosing a path to decide what decoder output should correspond to the current input would lead to poor performance. this is because we do not have all the data we can get. because the current encoder input has an effect on the decoder inputs for some time into the future, future decoder inputs can be used to increase the reliability of current decoder output decision values. as a result, we introduce delay into the decoder, waiting for future inputs to be processed before making a decision. for the viterbi algorithm, this means that while the current array of paths may not have reliable decisions about the present, they do contain reliable data about the past. in practice, there is a traceback depth for the decoder, the number of states we need to look back before the decision data is assumed to be reliable. for this depth, it is highly likely that the paths have converged. this means that all current paths have the same data up to this time in the past. if this depth is less than the register size used for the paths (i.e., 16 bits for the DSP56600 and the dsp56300 in 16-bit arithmetic mode, 24 bits for dsp56300 otherwise), we can choose a path and output the most significant bits of the register as decisions. if this is not the case, we must make some provision to store the paths. for our example, a traceback depth of 16 bits is inadequate. a depth of 24 bits is extremely marginal for this example, while 35 bits is probably more than needed. hence, we implement traceback storage. for this example, we actually store paths for the entire sequence length (assumed to be 168 bits here), and produce the entire output at the end. it should be easy to modify the code to produce partial output data in stages if desired. the critical concept in storing paths is to arrange the path data so that we can connect the paths for consecutive storage times. the path bits of the survivor path bits at the end must be joined with their preceding path bits. because we stored the preceding path bits in memory, we must know where in memory to look for the survivor path. where in memory do we look? to see how this is done, consider an example path in register b0 just after a viterbi butterfly update. assume there are 13 pairs of decoder inputs processed so far. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3-12 viterbi decoder implementation expanding the viterbi algorithm storing the paths figure 3-1 stored b0 path note that the current encoder state is part of the path! in retrospect, this must be so, as both the state (the recreated encoder state) and its path (the potential decoder output? the recreated encoder input) represent the same data stream. to use this, store the path byte in a memory location with address defined by the time the path bytes are stored, offset by the current encoder state. then, make sure the saved partial path does not contain the current encoder state. instead, save the current encoder state part of the path to be stored the next time we save paths. to do the traceback, start with the survivor path at the end, and trace the path bits back in time to recover the decoder output. when we recover the stored path as part of our traceback, the final bits in that path point to the offset memory location where we need to read to continue the traceback. hence, if we omit the most recent 5 bits of the path, we can potentially store up to 11 bits of path (assuming 16-bit registers). we have chosen to store 8 in this example because reconstruction is a little easier if we have data in bytes. there is a trade-off. storing more path bits at a time means we don?t need to store paths as often (fewer instructions executed, less memory used), but the traceback reconstruction is more difficult because we have to piece together 11-bit data into 16-bit words. the code in our example code makes one more assumption. we assume that the data is sent in a block, and that the transmitter ends the block by 0 filling the encoder. the effect of 0 filling the encoder is to end everything at state 0. hence, for traceback, start everything at state 0. bit # 15 87 0 1 0 1 0 1 1 1 0 0 0 0 1 0 0 1 1 b0 encoder state stored path byte f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
expanding the viterbi algorithm traceback: obtaining the decoder output viterbi decoder implementation 3-13 for other codes that do not use this assumption, traceback can either begin at state 0 anyway (relying on the convergence of paths to give the correct output at the traceback depth) or the decoder can search for the state with the maximum path metric to begin the traceback. 3.6 traceback: obtaining the decoder output this section presents the code needed to obtain the decoder output from the stored path data. for this example, we have data organized in blocks of 168 bits. because the encoder is flushed to the zero state at the end, and the data for all 168 bits is stored, the traceback can obtain the entire block at once. the most complicated portion of this code is the need to obtain the pointer to the next previous state from the current path data. the code listings for storage and traceback appear in example 3-3 and example 3-4 , respectively. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3-14 viterbi decoder implementation expanding the viterbi algorithm traceback: obtaining the decoder output to store the paths, begin by recalling the path storage address pointer. in the example code, this is stored in n0. we place copies into r3 (to use as the address pointer for data writes) and r0 (to update the pointer to the next page at the end of the routine). we then mask off the 5 lsbs, and proceed to store the paths. we have designed this macro to process a variable number of states, controlled by the macro argument lpcnt. this variability will be used in section 4 . example 3-3 partial path storage code listing ;***********************store partial path metrics macro*************** ; function: the storage is somewhat twisted. the stored paths are current ; up to the most recent input bit, which is not convenient for ; traceback. i process the data as follows: i pre ; loaded the path with 6 bits. thereafter, i process 8 path bits ; so the path has 14 bits. the most significant 8, i save. ; the remaining bits are the current encoder values for that path. ; the 5 lsb's are the current state. ; inputs: ; n0 should point to memory where the path bits are to be stored ; r5 should point to the latest path metric for state 0 ; outputs: ; updated paths storage x memory, n0 points to next path storage ; registers used: ; a,b,x1,r0,n0,r3,r5 ;********************************************************************* ; ; store off path data in bytes to avoid overflow in path reg's ; storepaths macro lpcnt ; move n0,r3 ;n0 stores path data pointer move n0,r0 move #>$1f,x1 ;mask for 5 lsb's(numencbits of 1's) do lpcnt,_pstore1 ;store paths for each state move y:(r5),b ;grab path asr #5,b,a ;align bits 5-12 with a1 ; (the 8 bits beyond enc) and x1,b ;mask off 5 lsb's to return to path move a1,x:(r3)+ ;store 8 ms path bits move b1,y:(r5)+ ;return 5 ls path bits _pstore1 lua (r4+numstates),r5 ;r5 points to latest states lua (r0+numstates),n0 ;update path data pointer endm f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
expanding the viterbi algorithm traceback: obtaining the decoder output viterbi decoder implementation 3-15 for each path stored, then, we start by reading in the path. we move the path 5 bits to the right, so that a1 has the byte of path data to store right aligned. we mask off all but the 5 rightmost bits in b1 to clear the stored path bits. the loop finishes by writing both the path data to be stored and the updated path for the path metric. the macro finishes by setting r5 for use in the next butterfly, as well as updating n0 to point to the next page of stored path data during the next storage interval. example 3-4 traceback output path code listing ;***********************traceback output path macro******************* ; function: to output the correct data, we begin at the end. we take the ; output path of the survivor state (0), and place its associated ; output path in memory as the last output data byte. then we use ; bits 3-7 of that data as an offset pointer to the correct traceback ; data of the next previous path data memory. we continue this until ; we have traced the data back to the beginning. ; inputs: ; n0 should point to memory where the path bits are to be stored ; r5 should point to the latest path metric for state 0 ; outputs: ; decoder output data in y memory ; registers used: ; a,b,y,r0,n0,r2,r5 ; ;********************************************************************* ; ; store off path data in bytes to avoid overflow in path registers ; traceback macro ; move n0,r0 ;path ptr to r0 move #0,n0 ;prep for traceback move #decout+(numinputs/8/2),r2 ;point to end to trace data move y:(r5),b ;recall last path move #-1,m2 ;r2 now linear move b1,x:(r0) ;save off last path data move #$513,x0 ;control word for extract, 16 bit ; move #$501d,x0 ;control word for extract, 24 bit ; ;*****************begin traceback***************************** ; if (even==0) move x:(r0+n0),a ;recall last path extractu x0,a,b ;bits 3-7 of a1 point to next data lua (r0-numstates),r0 ;dec r0 to next earlier state set lsl #8,a ;move to upper byte move b0,n0 ;load as offset for traceback move a1,y:(r2)- ;save off endif ; f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3-16 viterbi decoder implementation expanding the viterbi algorithm traceback: obtaining the decoder output we begin the traceback in example 3-4 by moving the path storage pointer to r0 and initializing n0 for its use as an offset. next we position r2 at the end of the buffer that will contain the decoder output. again, this is because we will traceback the path from the end survivor path to the beginning. for our example, we can start at state 0. start by taking the current path for state 0 (r5 points to this) and reading into b. we put r2 in linear addressing mode, then store the last path (the loop will read it again). finally, we load x0 with the control code ($513) to extract the pointer to the next traceback location from the path. the next section of code is used if the number of bytes of data is odd. it decodes the last byte, and places it in the most significant byte of decoder output. as this code is like the last half of the following do loop, discussion of this code is deferred. the do loop processes the traceback two bytes at a time. in this way, it can assemble the decoder output into 16-bit words for more efficient storage. begin by reading the current path into a1. then, extract the pointer to the next (actually previous) path using extract. recall the control register contains $513, which means we take bits $13-$17 of a or, equivalently, bits 3-7 of a1. note that we can load $501d instead and the program will work in 24-bit mode, although the example data provided would require 00 appended to every data symbol to test correctly. as noted above, these most significant bits of the path byte point to the storage address of the path continuation in the previous page of memory. by page, we mean the block of stored path bits for each state. there is one page written each time we store paths (every 8 decoder input periods). next, move r0 to point to the previous page, then move the extracted path pointer bits to n0 to be used in the next traceback read. we end processing of this least significant byte by moving it to x1. do #numinputs/8/2,trcbk ;once for each byte pair move x:(r0+n0),a ;recall last path extractu x0,a,b ;get ptr to next earlier path lua (r0-numstates),r0 ;point r0 to next earlier states move b0,n0 ;save ptr as offset move a1,x1 ;save out byte in x1 move x:(r0+n0),a ;do it all again! extractu x0,a,b lua (r0-numstates),r0 move b0,n0 lsl #8,a ;move to upper byte or x1,a ;or in last byte to get 16 bit word move a1,y:(r2)- ;store result trcbk endm example 3-4 traceback output path code listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
expanding the viterbi algorithm main: gluing the pieces together viterbi decoder implementation 3-17 again, we read the next (previous) path into a1 using r0, which points to the previous page, and n0, which points to the correct path in that page. we extract the new path pointer information, update r0 to the next previous page, and move the path pointer to n0. move the path data to the upper byte, or in the lower byte from x1, and move the resulting word into memory using r2. the loop continues until we have traced back to the beginning, and y:decout points to the start of the decoder output. 3.7 main: gluing the pieces together the only thing left is to put the pieces together. in examples 3-5 through 3-7 , we present the main routine. its function is to initialize registers so everything starts properly, and to invoke the macros at the needed time. example 3-5 begins the viterbi decoding routine by setting equates for the encoder size and input data length, and initializing address registers. numstates is set to the number of encoder states, which for our example is 32. encbits is the number of bits used to encode. this includes the input bit as well as the state bits. noofacsbutt sets the number of acs butterflies for the butterfly loop. for our rate 1/2 code, this is automatically set to half the number of states. the even flag is used by the assembler to include an extra half-byte of traceback if the number of input bytes of data is odd. example 3-5 main viterbi decoding routine: initialization ;***************************main*********************************************** ; numstatesequ 32 encbits equ 6 ;most cases=log2(numstates)+1 noofacsbutt equ numstates/2 numinputsequ 168 even equ 1-(numinputs/8)%2 ;even set to 1/0 if numinputs is ; even/odd #bytes org p:$400 vitdec move #numstates/2-1,m2 ;r2 points to branch metric table move #>3,n2 ;increment for branch metric storage move #state1,r5 ;r5 points to current state metric move #state2,r4 ;r4 points to updated state metric move #numstates*2-1,m4 ;both modulo to flip loc each sym move #numstates*2-1,m5 move #numstates/2,n5 ;input metrics spacing for each butterfly move #pathout,n0 ;n0 points to storage for output paths move #-1,m3 ;set linear mode, traceback ptr ; move #indata,r1 ;r1 points to input data ; f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3-18 viterbi decoder implementation expanding the viterbi algorithm main: gluing the pieces together next, we begin the executable code. the first two lines initialize address registers m2 and n2 so that r2 addressing wraps around and increments by 3?s?both for branch metric storage. the next 5 lines initialize address registers r4 and r5 to wrap around for path metric/path storage to begin them pointing to the correct places in the storage table. offset register n5 is set to access path metrics spaced halfway apart in the storage table. the initialization finishes by pointing n0 to the path storage table start, making sure r3 is in linear mode for traceback, and pointing r1 to the input data. note that the r1 addressing mode is not set. we assume that an outside calling routine does this, and permit circular input buffers if desired. these next two sections do much of the decoding work. the first loop processes encbits of input data. we process 1 pair of decoder inputs for each iteration of the macros findmetrics and acs. this preprocessing loads encbits-1 of data into the paths. thereafter, we process the decoder input pairs in groups of eight, producing 8 path bits for each state. for every 8 decoder input pairs processed, we invoke the storepaths macro to store the paths. example 3-6 main viterbi decoding routine: patch metric update ; ;***********************preparation loop************************* ; this loop iterates by the number of bits used in the ; encoder to pre load the bit decisions. thereafter, ; the paths are updated and stored off in bytes. we need ; the preload bits so that the stored path metrics point ; correctly to their previous paths for traceback ;**************************************************************** ; do #encbits-1,prelp ;once for each encoder bit ; findmetrics acs prelp ;*****************main loop--process bytes of data***************** do #numinputs/8-1,datalp ;process bytes of output do #8,symlp ;8 bits per byte ; findmetrics acs symlp storepaths#numstates datalp f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
expanding the viterbi algorithm memory organization viterbi decoder implementation 3-19 by preloading encbits-1 and storing the leftmost eight every time we store paths, we ensure that the paths always have (encbits-1)+8 bits at the time we store paths.the extra encbits-1 bits are, as noted above, the current state, and are kept till the next path store time to allow us to traceback. the main loop processes all but the last 8-encbits+1 sets of decoder input data. example 3-7 shows the final processing for the data block. at this time, we have processed all but the last byte, i.e., 8 bits less the encbits-1 processing we did in preprocessing. this means we have 3 bits of data left to process. after processing the last decoder input, the traceback macro is invoked to obtain the decoder output. the decoded data is memory at y:decout, in 16-bit word form. 3.8 memory organization example 3-8 shows all memory organization except the input data defines used to test the code. the first reserved spaces are dedicated to the storage for the path metrics (in x memory) and their respective paths (in y memory). note that these must be collocated. for this code, we initialized the metrics for that state 00 has a large path metric, and the rest are 0. we do not initialize the path metrics in executable code! this allows the decoder to operate on data over multiple invocations if desired. if the decoder is operating on independent data blocks, each started assuming the encoder starts in the 0 state, then this memory will have to be initialized accordingly. another alternative in this case would be to modify the butterfly loop in the preprocessor to use the fact that the encoder starts in the 0 state. this option is explored in section 4 . example 3-7 main viterbi decoding routine: termination and traceback ; ;********postprocessing,last info byte ******************************** ; ; encbits preprocessed, so we have 8-encbits bits left. ; for this example, we have 3inputs left to process ;********************************************************************** ; ; the last three bits------ ; do #8-encbits+1,flsh1;process last 3 bits to get last byte ; findmetrics acs flsh1 ;*******traceback the path data to obtain the decoder output********** traceback fin nop f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3-20 viterbi decoder implementation expanding the viterbi algorithm memory organization the path metric/path memory must start on a 0 mod 2*numstates boundary for the modulo addressing to work properly. in addition, the branch metric storage bry must start on a 0 mod numstates/2 boundary to operate correctly. finally, the storage of the input data must be paired in x and y memory, because we do long moves to get the decoder input data pairs into the branch metric calculation. example 3-8 memory organization code ;**********************memory organization**************************** ; most of the memory location is important--the first two ; labels of the x and y data memory are paired and must ; be co located (at the same addresses). in addition, state1 ; and state2 must be located on a 0 mod 2*numstates boundary, ; bry must be on a 0 mod numstates/2 boundary. ; x and y memory for input data must be paired --good luck..... ;********************************************************************* ; org x:$0 state1 dc $0ff,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 state2 dc $0ff,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 ; pathout ds numstates*(numinputs/8+1) indata ;this data encodes ;$1234,$5678,$9abc,$4973,$7925,$3491,$ad43,$ff21,$7ebb,$0100,$00 ;.... ;i didn?t include the test data in this listing... org y:$0 path1 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 path2 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 bry dsm numstates/2 decout ds numinputs/8 org y:@cvs(y,indata) ydata dc $a000,$a000,$a000,$6000;.... f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
section 4 algorithmic extensions f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
4-2 viterbi decoder implementation algorithmic extensions 4.1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-3 4.2 allowing more general branch metrics . . . . . . . . . . . . . . . . . . . .4-3 4.2.1 modify viterbi butterfly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-4 4.2.2 modify branch metric generation . . . . . . . . . . . . . . . . . . . . . .4-6 4.3 starting from 0: the pre acs macro . . . . . . . . . . . . . . . . . . . . . .4-8 4.4 collapsing the states. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-10 4.5 main: putting the pieces back together . . . . . . . . . . . . . . . . . .4-12 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
algorithmic extensions introduction viterbi decoder implementation 4-3 4.1 introduction this section shows how to adjust the code of section 3 to include several enhancements. generalized branch metrics are allowed, as well as optimizing the code to reduce the computations when data occurs in blocks. the code in this section introduces several modifications to the code discussed in section 3 . section 4.2 modifies the handling of branch metrics to allow nonsymmetrical branch metrics. section 4.3 and section 4.4 develop two additional macros useful if the data is blocked and the encoder is periodically forced through a known state. we put all of these routines together to produce a modified viterbi algorithm decoder that has autonormalized path metrics and efficient coding for blocked data. note that a complete listing of this code appears in appendix b . 4.2 allowing more general branch metrics the code introduced so far is efficient for algorithms that have special branch metrics. in particular, the branch metrics are assumed to be inverses of each other and flipped between state pairs. although this is true for many popular codes in use today, it is not necessarily the case. if a polynomial set is specified where one of the encoding polynomials does not have a tap at one end, the branch metrics lose some of their symmetry. a simple example is the polynomial set (1,1+d), whose trellis appears in figure 4-1 . figure 4-1 polynomial (1,1+d) trellis note that the input branches to state 0 are not complements of each other! we might accommodate this with a fractional branch offset to make branch 00 and branch 01 negatives, but note that the offset for the lower state would have to be different. another approach is to compute separate branch offsets for the upper and lower state updates. this requires a different version of the branch metric generation as well as the butterfly loop. we consider modified code for each of these functions in the examples that follow. 00 + 0 1 11 01 10 s 0 s 0 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
4-4 viterbi decoder implementation algorithmic extensions allowing more general branch metrics 4.2.1 modify viterbi butterfly begin modification with the butterfly loop. the code from section 3 reads in the branch metric from y memory near the loop end (the y:(r0)+,y1 that appears two lines from _p_nextstage). the easiest way to get two branch metrics is to do long reads on the branch metrics. for the code in section 3 , however, finding the space to do the extra branch read is harder. most of the data movement is tightly controlled and cannot be moved in the code without disrupting the data flow. instead, we can make use of the pipeline stall in the viterbi butterfly loop. the modified code appears in example 4-1 . example 4-1 modified viterbi butterfly ;*******************viterbi add, compare, select butterfly macro*** ; function: update path metrics/paths for the viterbi algorithm by ; doing an add,compare,select update for state pairs. ; inputs: ; r2 should point to the beginning of the branch metric table ; r2 should point to the beginning of the branch metric table ; r5 should point to the latest path metric for state 0 ; r4 should point to the storage location for updated state 0 ; n5 should offset addresses by numstates/2 ; outputs: ; updated path metrics/paths stored in xy memory ; registers used: ; a,b,y01,r2,r4,r5,n5 r2 unchanged (modulo req'd) ; registers: ; r5, pointer to the path metric/path table, arranged as ; x: path metric, y: path,states ordered assuming ; bits shift right to left. ; r4, pointer to the output path metric/path table ; r0, pointer to the branch metric table, arranged ; as x:c, y:d, cd,cd,cd, etc. ; ; sa------nsa ; \ c / ; d \ / ; \/ ; /\ ; d / \ ; / \ ; sb------nsb ; c f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
algorithmic extensions allowing more general branch metrics viterbi decoder implementation 4-5 this code is almost identical to the butterfly code in section 3 , with the following differences. the first executable line is a long move to register y. this allows us to grab two branch metrics at the same time. for this example, we show both branch metrics being added to the path metrics to update the states. line 9 in the loop (the assembly line move l:(r2)+,y) is a long read of two more branch metrics. this is an extra line of code, but it executes in the same number of cycles as the original code because it takes the place of a pipeline stall. finally, note that address register 5 is still incremented, but without a dummy read into a register. the loop uses the same branch metrics for both upper and lower path metric updates. this is what is needed for example 4-1 , where we still have some symmetry because both encoding polynomials have a tap on the input bit. if this is not the case, the code can accommodate other conditions. a more general setup is to uncomment the long move that reads additional branch metrics in line 4 of the loop. this line will also take the place of a pipeline stall, and so does not increase the number of cycles needed to execute the butterfly. ;***************************************************************** ; acs macro ; ; move #bry,r2 ;r2 points to branch metrics move l:(r2)+,y ;get first branch metric move l:(r5)+n5,a ;load 1st metric/path pair ; do #noacs,_p_nextstage ;update each state add y0,a l:(r5)-n5,b ;sum pt,br met,get next pair add y1,b ;update metric max a,b l:(r5)+n5,a ;pick max, reload 1st pair ; move l:(r2)+,y ;load next branch metrics vsl b,0,l:(r4)+ ;save surivior, end top half add y1,a l:(r5)-n5,b ;sum pt,br,reload next pair add y0,b (r5)+ ;sum again,inc pt mt read ptr max a,b l:(r5)+n5,a ;pick max, load next pair move l:(r2)+,y ;load next branch metrics vsl b,1,l:(r4)+ ;write survivor,end 2nd half _p_nextstage nop ;needed to separate do loop ends endm example 4-1 modified viterbi butterfly (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
4-6 viterbi decoder implementation algorithmic extensions allowing more general branch metrics 4.2.2 modify branch metric generation the branch metric generation needs modification also. example 4-2 shows the user a different motivation for noncomplementary branch metrics. in this example, we use convolutional code from section 3 , but take the path metric for state 0 and subtract it from all branch metrics. this is a computationally inexpensive way to automatically normalize the path metrics so that we need not worry about arithmetic saturation of path metrics when decoding very long streams of data. it does, however, destroy the symmetry of the branch metrics. example 4-2 modified branch metric generation code ;*****************branch metric macro************************************* ; function: input data and generate branch metrics. this function ; subtracts the path metric for state 0 from all branch metrics ; to provide autonormaliztion. for this decoder, the metric is a scaled ; sum or difference of the real and imag inputs. ; inputs: ; r2 should point to the beginning of the branch metric table ; r5 should point to the latest path metric for state 0 ; r1 should point to the next input xy data pair ; outputs: ; branch metrics are stored at brx in xy memory ; registers used: ; a,b,x01,y01,r1,r2,r5, r5 unchanged,r2 unchanged (modulo req'd) ;**************************************************************** ; findmetrics macro move x:(r5),a ;st. metric scaling, read last st. 0 neg a l:(r1)+,y ;negate metric|grab dec input move #-16,x1 ;sign for real component, upper br. mac x1,y1,a a,b ;comp 0x partial br. path mt. to move y1,x0 ;mv real input to x0 mac x1,y0,a a,y1 ;a gets 00 br. y1 gets 0x partial br subl a,b b,x0 ;a has 00 br, b has 11 br, save path tfr y1,a a,y1 ;swap 0x and 00 to compute 01 branch mac -x1,y0,a b,x1 ;a gets 01 br tfr x0,b b,y0 ;swap path metric,11 branch to y0 subl a,b y1,x0 ;b gets 10 br, x0 has copy of 00 br. ;***************************************************************************** ; at this point, x is configured with br 11|00, y with 00|11, and ; ab with 01|10, ba with 10|01 all needed for quick storage in xy ; memory. ;***************************************************************************** f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
algorithmic extensions allowing more general branch metrics viterbi decoder implementation 4-7 for this example, we are using the same code as in section 3 to make the unnormalized branch metrics symmetrical. for easy storage in memory, we want to generate metric pairs. each pair should have the values needed for the upper and lower branch metric value needed for the butterfly. for this example, the branch metric for encoder output 00 will be paired with the branch metric value for 11 and the value for 01 paired with the value for 10. all branch metrics will have the previous path metric for state 0 subtracted from them. this means that after the viterbi butterfly update, each path metric value will have been normalized. it can be shown that this limits the worst case path metric values to be no more than 12 times the maximum branch metric value (for this code example). the branch metric macro begins by reading in the last path metric for state 0. it then negates that value in accumulator a and reads the decoder input into register y. the data in y is assumed to be in the form y1:y0 mapped to real:imaginary data input. next, the scaling factor -16 is placed in register x1. the mac instruction finds a partial branch metric by multiplying the real data by the scaling factor and adding the result to the negated path metric value in a. we also save the negated path metric value to b for later use. next, we move the real decoder input from y1 to x0. the next mac instruction multiplies imaginary input by the scaling factor and adds it to a. the accumulator a now has the normalized branch metric for encoder 00 outputs. at the same time, the partial branch metric that was in a is saved in y1 for later use. move x,l:(r2)+ ;store 11 in location 0. move ab,l:(r2)+ ;store 01 in location 1 move ba,l:(r2)+ ;store 10 in location 2 move y,l:(r2)+ ;store 00 in location 3 move y,l:(r2)+ ;store 00 in location 4 move ba,l:(r2)+ ;store 10 in location 5 move ab,l:(r2)+ ;store 01 in location 6 move x,l:(r2)+ ;store 11 in location 7 move ba,l:(r2)+ ;store 10 in location 8 move y,l:(r2)+ ;store 00 in location 9 move x,l:(r2)+ ;store 11 in location 10 move ab,l:(r2)+ ;store 01 in location 11 move ab,l:(r2)+ ;store 01 in location 12 move x,l:(r2)+ ;store 11 in location 13 move y,l:(r2)+ ;store 00 in location 14 move ba,l:(r2)+ ;store 10 in location 15 r2-> brx endm example 4-2 modified branch metric generation code (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
4-8 viterbi decoder implementation algorithmic extensions starting from 0: the pre acs macro the subl a,b instruction does a shift left and subtract operation on b and a to obtain the branch metric for encoder 11 outputs. the negated path metric value is saved in x0. the transfer instruction and its parallel move swap y1 with a to save the 00 branch metric and place the partial branch metric for the next computation. the next mac instruction scales the imaginary part and subtracts it from the partial branch metric to obtain the branch metric for encoder 01 outputs. the 11 branch metric is saved in register x1. the negated path metric is placed in b, and the 11 branch metric is move to y0. the final subl instruction shift left and subtracts the negated path metric from the 01 branch to get the 10 branch metric in b. the 00 branch metric is copied to x0. after all these moves, we find that register x1:x0 contains branches 11:00, register y1:y0 contains branches 00:11, and a1:b1 has branches 01:10. note that we can access accumulators as ab or ba to get branches 01:10 or 10:01. all we have to do now is to place the branch metric pairs in memory to correspond with their states for the viterbi butterfly loop. unlike the case in section 3 , our branch metric generation allows us to use a simple memory placement scheme. we begin with the memory location for state 0, and increment through the states until the r2 address pointer rolls over to 0 after location 15 (r2 is set to be a modulo 16 address pointer). because we have all possible needed branch metric pairs in some register, we just code each move with the branch metric pair for that state. 4.3 starting from 0: the pre acs macro for data that is coded in packets, it is not unusual for the encoder to start in a known state. the easiest case is when the encoder begins each data packet starting in state zero. this is what is assumed for this example. for this case, the beginning trellis is shown in figure 4-1 . notably, the number of states to be updated changes from 2 to 4 to 8, etc., doubling until the full 32 states is reached. until that time, we have fewer computations to do because the number of states is smaller and because no comparison of competing branches is needed until 32 states of the trellis are being used. we can reduce the computation needed for a packet by treating the first five decoder inputs separately. the code to do this appears in the preacs macro in example 4-3 . the first two moves save the state address pointers to be restored at the routine?s end. next the branch metrics are loaded into the accumulators b and a. finally, the path metric/path pair is loaded into x. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
algorithmic extensions starting from 0: the pre acs macro viterbi decoder implementation 4-9 the do loop processes each state update. the branches are added to the state to obtain the state updates. the second of these adds also loads in the next path metric/path pair for the next loop iteration. the vsl instructions store the updated path metrics, and the move loads the next branch metrics. after the loop, we double n5 (used to vary the number of loop iterations), restore r4 and r5 to swap path metric read/write addresses, and reinitialize the branch metric pointer for use by the branch metric update routine. example 4-3 pre-acs macro ;*******************************;preacs*************************************** ; function: update path metrics/paths for the viterbi algorithm by ; acs butterfly from assumed 0 state to start, and double ; the number of states on each invocation until the full ; trellis is used. this routine cannot be used unless the ; encoder is starting from an all 0's state. note this ; means it cannot be used if the data is a continuation ; of a previously processed data stream. use the acs macro ; instead. however, for packetised data, or other data that ; assumes the data starts with the encoder 0 filled, this ; routine saves cycles, and only state 0 needs to be initialised ; before starting the decode. ; inputs: ; r2 should point to the beginning of the branch metric table ; r5 should point to the latest path metric for state 0 ; r4 should point to the storage location for updated state 0 ; n5 should be the number of input states/2 to process ; n5 is doubled each time this macro is invoked ; outputs: ; updated path metrics/paths stored in xy memory ; registers used: ; a,b,y01,r2,r3,n3,r4,r5,n5 r2 unchanged (modulo req'd) ; registers: ; r5, pointer to the path metric/path table, arranged as ; x: path metric, y: path,states ordered assuming ; bits shift right to left. ; r4, pointer to the output path metric/path table ; r0, pointer to the branch metric table, arranged ; as x:c, y:d, cd,cd,cd, etc. ; ; sa------nsa ; \ c ; d \ ; \ ; \ ; \ ; \ ; sb nsb ; f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
4-10 viterbi decoder implementation algorithmic extensions collapsing the states 4.4 collapsing the states for codes that 0 fill the encoder to create an end of data block, we can reduce the number of cycles needed to finish the viterbi decoder processing. example 4-4 assumes the coder ends a block by 0 filling the encoder. for the decoder, this means that the last 5 inputs to a block are known beforehand to take the upper state. there are two cycle-saving consequences to this. first, we need not update the lower state in the butterflies. second, each of the last five inputs creates only half the number of states from the previous input. for the last input, we update state 0 only, giving us a single starting point for traceback. the code appears in example 4-4 . ;***************************************************************** ; preacsmacro ; ; move#bry,r2 ;r2 points to branch metrics mover5,r3 ;save r5 to init r4 later mover4,n3 ;save r4 to init r5 later movel:(r2)+,ba ;get first branch metrics movel:(r5)+,y ;load 1st metric/path pair ; do n5,_p_nextstageupdate each state add y,a ;update metric add y,b l:(r5)+,y;updt met, ld nxt pair vsl a,0,l:(r4)+;end top half ; vsl b,1,l:(r4)+;end 2nd half movel:(r2)+,ba ;ld br met _p_nextstage moven5,b ;recall loop count asl b n3,r5 ;mult it by 2 mover3,r4 moveb,n5 ;storage for loop count move#bry,r2 ;reinit branch ptr endm example 4-3 pre-acs macro (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
algorithmic extensions collapsing the states viterbi decoder implementation 4-11 example 4-4 the acsflush macro ;***********************acsflush macro**************************** ; function: this routine is very similar to the acs macro, except that ; the encoding shift register is now flushing back to 0. ; this means that only the upper paths are taken, halving ; the number of states we need to update. the acs code is ; modified so that we don't compute the lower paths. in ; addition, state storage is modified so that survivor ; paths are stored in consecutive memory locations. survivors ; are even states on the first pass, then every fourth state,etc. ; inputs: ; r2 should point to the beginning of the branch metric table ; r5 should point to the latest path metric for state 0 ; r4 should point to the storage location for updated state 0 ; n5 should be the number of input states/2 to process ; n5 is halved each time this macro is invoked ; outputs: ; updated path metrics/paths stored in xy memory ; registers used: ; a,b,y01,r2,n2,r3,n3,r4,r5,n5 r2 unchanged (modulo req'd) ; ; sa------nsa ; c / ; / ; / ; d/ ; / ; / ; sb ; ;***************************************************************** acsflush macro ;** move #bry,r2 ;r2 points to branch metrics move r5,r3 ;save r5 to init r4 later move r4,n3 ;save r4 to init r5 later move l:(r2)+n2,y ;get first branch metrics move l:(r5)+n5,a ;load 1st metric/path pair do n5,_nextstage ;update each state add y0,a l:(r5)-n5,b ;updt met,load next pair add y1,b l:(r2)+n2,y ;updt met,ld nxt br met max a,b (r5)+ ;sel surv,save met,inc st ptr move l:(r5)+n5,a ;ld next pair vsl b,0,l:(r4)+ ;end 2nd half _nextstage move n5,b ;recall flush count, loop count asr b n2,a ;divide it by 2,prep double branch jump move n3,r5 move b,n5 ;storage for flush count asl a (r2)-n2 ;reinit branch ptr move r3,r4 move a,n2 ;next pass, skip more branches endm f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
4-12 viterbi decoder implementation algorithmic extensions main: putting the pieces back together we have placed a comment at the beginning, moving the branch metric table address to r2. since this is done in the findmetrics routine, we don?t actually need to execute this, but it is included as a reminder that r2 needs to be set properly. the remainder of this routine is like the acs butterfly, except that parts are removed, and the number of iterations varies. we store the beginning values of r4 and r5 so that we can restore them at the end, swapped with each other. because the number of loop iterations varies, we don?t get the automatic swapping that the acs macro has. the first branch metrics are fetched, and the updated count value is stored in memory. we then load the first path metric/path pair. the loop is quite similar to the acs except we have eliminated unneeded operations. we add a branch metric and load the next path metric/path pair (for the lower path). we then add a branch metric and read the branch metrics for the next loop iteration. taking the max of the accumulators chooses the survivor path metric/path pair, and the path metric read pointer is incremented for the next loop in parallel. the move loads the path metric/path pair for the next loop. the vsl completes the current state update for the upper state. note that the state storage pointer is incremented only once, even though we only update the upper state. this means that state 2 will be written at the address normally used for state 1, state 4 at the address used for state 2, etc. the next invocation of the acsflush will read the data correctly, because the read pointer increment, r5 is halved for each invocation of this macro. during the next invocation, we will only have states 0, 4, 8, etc. these are exactly the states that are possible when the encoder is being 0 filled. to end this macro, we restore r4 and r5 to the desired values, swapping old and update memories as needed. the loop count is read from n5, halved, and restored, so that we process half as many states on the next loop invocation. 4.5 main: putting the pieces back together there are a number of changes to the main program that are required for these new macros to operate properly. the code for main appears in the following example. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
algorithmic extensions main: putting the pieces back together viterbi decoder implementation 4-13 example 4-5 main program code changes ;***************************main*********************************************** ; numstatesequ 32 encbits equ 6 ;most cases=log2(numstates)+1 noacs equ numstates/2 numinputsequ 168 even equ 1-(numinputs/8)%2 ;even set to 1/0 if numinputs is ; even/odd #bytes org p:$400 vitdec move #numstates/2-1,m2 ;r2 points to branch metric table move #brx,r2 move #state1,r5 ;r5 points to current state metric move #state2,r4 ;r4 points to updated state metric move #numstates*2-1,m4 ;both modulo to flip locations each sym move #numstates*2-1,m5 move #>1,n5 ;ctr/input metrics spacing for each butterfly ; move #numstates/2,n5 ;input metrics spacing for each butterfly move #pathout,n0 ;n0 points to storage for output paths move #-1,m3 ;set linear mode, traceback ptr move #>1,n2 ; move #indata,r1 ;r1 points to input data ; ;***********************preparation loop************************* ; this loop iterates by the number of bits used in the ; encoder to pre load the bit decisions. thereafter, ; the paths are updated and stored off in bytes. we need ; the preload bits so that the stored path metrics point ; correctly to their previous paths for traceback. ;**************************************************************** ; do #encbits-1,prelp ;preload decisions/trellis start ; findmetrics preacs ; acs prelp move #numstates/2,n5 ;n5 now serves as offset between fetch ; states ; ;*****************main loop--process bytes of data***************** do #numinputs/8-2,datalp;process bytes of output do #8,symlp ;8 bits per byte ; findmetrics acs symlp storepaths#numstates datalp f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
4-14 viterbi decoder implementation algorithmic extensions main: putting the pieces back together ;********postprocessing,last 2 info bytes & flush encoder back to 0**** ; ; these setups should accomodate encoders of 5 to 8 bits. ; encbits-1 preprocessed, so we have 16-encbits+1 bits left. ; we do encoder flushing for #encbits-1 bits. so we have ; (16-encbits+1) - encbits+1 do do before flushing. ; next, we process the encoder flush bits, stopping to store ; paths as needed. we store after 8 bits processed so this means ; store paths after 16-2*encbits+2 (from nonflush) +8-(16-2*encbits+2) ; (these are the additional flush bits needed before storing paths). ; finally, we finish out the data. the bits remaining are: ; 16-encbits+1-(16-2*encbits+2)-(8-(16-2*encbits+2)) ; = 8-encbits. ; ; for this example, we nonflush process ; 6 more bits, then flush 2, store path, then flush ; the last three to state 0. ; last 6 nonflush bits--- ;********************************************************************** do #16-(2*encbits)+2,last6 findmetrics acs ; last6 ; ; encoder flush----- ; move #>numstates/2,n5 ;flush, process 16,8, 4, then 2, ; then 1 state do #8-(16-(2*encbits)+2),flsh;process 3 of last five bits to get ; next to last byte findmetrics acsflush ; acs flsh storepaths#numstates/8 ; ; flush the last three bits------ ; do #8-encbits+1,flsh3 ; do last 3 bits to get last byte ; findmetrics acsflush ; acs flsh3 ; traceback ; finish nop example 4-5 main program code changes (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
algorithmic extensions main: putting the pieces back together viterbi decoder implementation 4-15 to make the changes to the main program code, we begin as we did in section 3 , by initializing the address registers. r2 is still used to address branch metrics. registers r4 and r5 address the states for path metric storage. the differences needed for the initialization code are to set n5 to 1 (needed for tracking the number of states in the preacs code), and n2 is set to 1. this is different from the code in section 3 for two reasons. first, the branch metric generation is different, and second, using n2 to update register r2 for branch metric storage is no longer required. instead, we use n2 to change the addresses to read branch metrics for the acsflush routine. next, we do encbits-1 iterations of input processing to preload the path storage registers. this is different because we use the preacs macro instead of the acs macro. using the preacs macro means that the code uses the starting encoder 0 state to reduce execution cycles as well as to avoid the need of initializing the path metric storage for all the states. the next major piece of code is the nested do loops that process bytes of decoder input and store off the resulting paths. this code is identical to the code in section 3 , except that we process one less byte of data. the end processing is more complicated because there are two subprocesses occurring at the same time and their endings are not in phase. one process is the storing of the path data in bytes as the path storage registers fill up. the second process is the acsflush processing, which collapses the trellis states back to 0. for this example, start with 168 decoder inputs. we preprocess five inputs, and then process 19*8=152 more inputs in bytes using the nested loops. we have 11 decoder inputs left to process. we need to do 8 more to get the next byte, but after processing 6 inputs, we have to change from using the acs macro to using the acsflush macro (used to collapse states for the last five inputs). accordingly, we process the first 6 of the last 11 decoder inputs using the acs macro. we then process the next two decoder inputs using the acsflush macro. now we have processed eight more decoder inputs, and can store off the path data using the storepaths macro. we then finish processing the last three decoder inputs using the acsflush macro. the final path data is in the collapsed state (as it was in the code in section 3 ), and is access in the traceback macro as before. after doing the traceback using the traceback macro of section 3 , the decoder is done at the finish label. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
4-16 viterbi decoder implementation algorithmic extensions main: putting the pieces back together f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
section 5 summary f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
5-2 viterbi decoder implementation summary 5.1 summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-3 5.2 conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-3 5.3 program listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-4 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
summary summary viterbi decoder implementation 5-3 5.1 summary this application report began with an explanation of the viterbi algorithm in section 2 , introduced a basic viterbi decoder in section 3 , and offered some enhancements to the decoding assembly code in section 4 . the explanations in section 2 and section 3 are detailed enough for the ideas behind the code to be understood. the goal is to make it easy to take and modify this code to produce efficient code for any desired application of the viterbi algorithm. we chose a nontrivial industry standard code as our example, and there are some statistics that are worth noting. table 5-1 presents some notable statistics for code presented in this note. in the table, we compare the basic viterbi decoder code of section 3 with the basic code in section 4 . the basic code requires about 215 clock cycles per decoder input to decoder the data. as an example, this means that at a received data rate of 13,350 per second (the is-136 rate if all six slots in a frame are voice data to be decoded) 2.87 mips are required. if we use generalized branch metrics, the numbers increase, because the branch metric calculation is more complicated. note, however, that they lower again as the pre acs and acsflush routines are included. once all the enhancements of section 4 are included, the computational load drops to 2.81 mips, more than 3% below what the rate would be if we just used the generalized branch metrics. 5.2 conclusions in conclusion, note that the relative improvement is dependent on the length of the data packet. shorter packets show greater improvement. for example, an encoded voice data slot in is-136 uses a packet size of only 89 decoder inputs instead of our example 168. for packets of this size, the percent change is 4.8%?large enough to make the extra code worth considering. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
5-4 viterbi decoder implementation summary program listings table 5-1 viterbi decoder code statistics another statistic worth noting is the program memory required by each program. the basic viterbi decoder code in section 3 takes 178 program words, and the code in section 4 with all enhancements added takes 311. the large increase is mostly due to the unrolling of some loops at the end, required to phase the acsflush and storepaths macros correctly. notably, the amount of program space required for either routine is quite small, and may not be a large consideration for most systems. 5.3 program listings finally, note that there are three appendices to this application report. each appendix contains a program listing for the program as described on one of the sections of this manual. appendix a contains a listing for the complete code discussed in section 3 . appendix b contains a complete listing of the code discussed in section 4 . appendix c contains a complete listing of a 24-bit version of the section 4 code, suitable for implementation on a dsp56300 dsp operating in 24-bit arithmetic mode. code used total clock cycles clock cycles per information bit mips at 13,350 data rate % change basic viterbi 36127 215.04 2.87 ? viterbi with generalized branch metrics 36668 218.26 3.31 +1.5% +acsflush 36065 214.67 2.87 -0.172% +pre acs 35459 211.07 2.81 -1.85% f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
appendix a basic algorithm program listing f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
a-2 viterbi decoder implementation basic algorithm program listing a.1 viterbi algorithm program listing . . . . . . . . . . . . . . a-3 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
basic algorithm program listing viterbi decoder implementation a-3 a.1 viterbi algorithm program listing this appendix contains the complete program listing for a 16-bit implementation of the viterbi algorithm, as presented in section 3 of this manual. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
a-4 viterbi decoder implementation basic algorithm program listing example a-1 basic 16-bit implementation of a viterbi decoder opt mex ;***********************viterbi decoder*********************************** ; this routine implements a convolutional decoder using the viterbi alg. ; it is optimized for speed, which means that every alu register ; and all of the r registers are used. a significant amount of ; these can be freed by storing and reusing registers between the ; findmetrics routines and the acs,acsflush routines. ; input is in indata real in x imag in y. output begins at y:decout ;*****************branch metric macro************************************* ; function: input data and generate branch metrics. ; for this decoder, the metric is a scaled ; sum or difference of the real and imag inputs. ; inputs: ; r2 should point to the beginning of the branch metric table ; r5 should point to the latest path metric for state 0 ; r1 should point to the next input xy data pair ; outputs: ; branch metrics are stored at brx in xy memory ; registers used: ; a,b,x01,y01,r1,r2,n2, r2 unchanged (modulo req'd) ;**************************************************************** findmetrics macro move l:(r1)+,y ;grab dec input move #-16,x1 ;sign for real component, 0 sent. mpy x1,y1,a ;a has 0x partial branch move #bry+3,r2 ;storage for generated branch metrics mac x1,y0,a a,b ;a gets 00 branch mac -x1,y0,b ;b has 01 branch ; neg a a,x1 a,y:(r2)+n2 ;mv 00 to x1,11 to a, st 00 in location 3 neg b b,x0 b,y:(r2)+n2 ;mv 01 to x0,10 to b, st 01 in location 6 ;***************************************************************************** ; at this point x1 has 00, x0 has 01, ; a1 has 11, b1 has 10, needed for quick storage in y memory ;***************************************************************************** move x1,y:(r2)+n2 ;store 00 in location 9 move x0,y:(r2)+n2 ;store 01 in location 12 move b,y:(r2)+n2 ;store 10 in location 15 move b,y:(r2)+n2 ;store 10 in location 2 move b,y:(r2)+n2 ;store 10 in location 5 move b,y:(r2)+n2 ;store 10 in location 8 move x0,y:(r2)+n2 ;store 01 in location 11 move x1,y:(r2)+n2 ;store 00 in location 14 move x0,y:(r2)+n2 ;store 01 in location 1 move x1,y:(r2)+n2 ;store 00 in location 4 move a,y:(r2)+n2 ;store 11 in location 7 move a,y:(r2)+n2 ;store 11 in location 10 move a,y:(r2)+n2 ;store 11 in location 13 move a,y:(r2) ;store 11 in location 0. r2-> bry endm f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
basic algorithm program listing viterbi decoder implementation a-5 ; ;*******************viterbi add, compare, select butterfly macro*** ; function: update path metrics/paths for the viterbi algorithm by ; doing an add,compare,select update for state pairs. ; inputs: ; r2 should point to the beginning of the branch metric table ; r5 should point to the latest path metric for state 0 ; r4 should point to the storage location for updated state 0 ; n5 should offset addresses by numstates/2 ; outputs: ; updated path metrics/paths stored in xy memory ; registers used: ; a,b,x0,y1,r2,r4,r5,n5 r2 unchanged (modulo req'd) ; registers: ; r5, pointer to the path metric/path table, arranged as ; x: path metric, y: path,states ordered assuming ; bits shift right to left. ; r4, pointer to the output path metric/path table ; r0, pointer to the branch metric table, arranged ; as x:c, y:d, cd,cd,cd, etc. ; ; sa------nsa ; \ c / ; d \ / ; \/ ; /\ ; d / \ ; / \ ; sb------nsb ; c ;***************************************************************************** acs macro ; ; move #bry,r2 ;r2 points to branch metrics move y:(r2)+,y1 ;get first branch metric move l:(r5)+n5,a ;load 1st metric/path pair ; do #noofacsbutt,_p_nextstage ;update each state sub y1,a l:(r5)-n5,b ;sub pt,br met,get next pair add y1,b ;update metrics max a,b l:(r5)+n5,a ;pick max,reload 1st pair vsl b,0,l:(r4)+ ;store survivor,end top half add y1,a l:(r5)-n5,b ;add pt,br met,reload next pair sub y1,b x:(r5)+,x0 y:(r2)+,y1 ;inc st ptr,ld nxt br met max a,b l:(r5)+n5,a ;pick max met,load next pair vsl b,1,l:(r4)+ ;st survivor,end 2nd half _p_nextstage nop ;needed to separate do loop ends endm example a-1 basic 16-bit implementation of a viterbi decoder (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
a-6 viterbi decoder implementation basic algorithm program listing ;***********************store partial path metrics macro*************** ; function: the storage is somewhat twisted. the stored paths are current ; up to the most recent input bit, which is not convenient for ; traceback. i process the data as follows: i pre ; loaded the path with 6 bits. thereafter, i process 8 path bits ; so the path has 14 bits. the most significant 8, i save. ; the remaining bits are the current encoder values for that path. ; the 5 lsb's are the current state. ; inputs: ; n0 should point to memory where the path bits are to be stored ; r5 should point to the latest path metric for state 0 ; outputs: ; updated paths storage x memory, n0 points to next path storage ; registers used: ; a,b,x1,r0,n0,r3,r5 ;***************************************************************************** ; store off path data in bytes to avoid overflow in path reg's ; storepaths macro lpcnt ; move n0,r3 ;n0 stores path data pointer move n0,r0 move #>$1f,x1 ;mask for 5 lsb's(numencbits of 1's) do lpcnt,_pstore1 ;store paths for each state move y:(r5),b ;grab path asr #5,b,a ;align bits 5-12 with a1 ; (the 8 bits beyond enc) and x1,b ;mask off 5 lsb's to return to path move a1,x:(r3)+ ;store 8 ms path bits move b1,y:(r5)+ ;return 5 ls path bits _pstore1 lua (r4+numstates),r5 ;r5 points to latest states lua (r0+numstates),n0 ;update path data pointer endm ; example a-1 basic 16-bit implementation of a viterbi decoder (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
basic algorithm program listing viterbi decoder implementation a-7 ;***********************traceback output path macro******************* ; function: to output the correct data, we begin at the end. we take the ; output path of the survivor state (0), and place its associated ; output path in memory as the last output data byte. then we use ; bits 3-7 of that data as an offset pointer to the correct traceback ; data of the next previous path data memory. we continue this until ; we have traced the data back to the begining. ; inputs: ; n0 should point to memory where the path bits are to be stored ; r5 should point to the latest path metric for state 0 ; outputs: ; decoder output data in y memory ; registers used: ; a,b,y,r0,n0,r2,r5 ; ;********************************************************************* ; store off path data in bytes to avoid overflow in path reg's ; traceback macro ; move n0,r0 ;path ptr to r0 move #0,n0 ;prep for traceback move #decout+(numinputs/8/2),r2 ;point to end to trace data move y:(r5),b ;recall last path move #-1,m2 ;r2 now linear move b1,x:(r0) ;save off last path data move #$513,x0 ;control word for extract ; ;******************************begin traceback***************************** if (even==0) move x:(r0+n0),a ;recall last path extractu x0,a,b ;bits 3-7 of a1 point to next data lua (r0-numstates),r0 ;dec r0 to next earlier state set lsl #8,a ;move to upper byte move b0,n0 ;load as offset for traceback move a1,y:(r2)- ;save off endif ; do #numinputs/8/2,trcbk ;once for each byte pair move x:(r0+n0),a ;recall last path extractu x0,a,b ;get ptr to next earlier path lua (r0-numstates),r0 ;point r0 to next earlier states move b0,n0 ;save ptr as offset move a1,x1 ;save out byte in x1 move x:(r0+n0),a ;do it all again! extractu x0,a,b lua (r0-numstates),r0 move b0,n0 lsl #8,a ;move to upper byte or x1,a ;or in last byte to get 16 bit word move a1,y:(r2)- ;store result trcbk endm example a-1 basic 16-bit implementation of a viterbi decoder (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
a-8 viterbi decoder implementation basic algorithm program listing ;***************************main*********************************************** numstatesequ 32 encbits equ 6 ;most cases=log2(numstates)+1 noofacsbuttequ numstates/2 numinputsequ 168 even equ 1-(numinputs/8)%2 ;even set to 1/0 if numinputs is ; even/odd #bytes org p:$400 vitdec move #numstates/2-1,m2 ;r2 points to branch metric table move #>3,n2 ;increment for branch metric storage move #state1,r5 ;r5 points to current state metric move #state2,r4 ;r4 points to updated state metric move #numstates*2-1,m4 ;both modulo to flip loc each sym move #numstates*2-1,m5 move #numstates/2,n5 ;input metrics spacing for each butter- fly move #pathout,n0 ;n0 points to storage for output paths move #-1,m3 ;set linear mode, traceback ptr move #indata,r1 ;r1 points to input data ;***********************preparation loop************************* ; this loop iterates by the number of bits used in the ; encoder to pre load the bit decisions. thereafter, ; the paths are updated and stored off in bytes. we need ; the preload bits so that the stored path metrics point ; correctly to their previous paths for traceback ;**************************************************************** do #encbits-1,prelp ;once for each encoder bit ; findmetrics acs prelp ;*****************main loop--process bytes of data***************** do #numinputs/8-1,datalp ;process bytes of output do #8,symlp ;8 bits per byte ; findmetrics acs symlp storepaths#numstates datalp ;********postprocessing,last info byte ******************************** ; ; encbits-1 preprocessed, so we have 8-encbits+1 bits left. ; for this example, we have 3 inputs left to process ;********************************************************************** ; the last three bits------ do #8-encbits+1,flsh1 ;process last 3 bits to get last byte findmetrics acs flsh1 ;*******traceback the path data to obtain the decoder output********** traceback fin nop example a-1 basic 16-bit implementation of a viterbi decoder (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
basic algorithm program listing viterbi decoder implementation a-9 ;**********************memory organization**************************** ; most of the memory location is important--the first two ; labels of the x and y data memory are paired and must ; be co located (at the same addresses). in addition, state1 ; and state2 must be located on a 0 mod 2*numstates boundary, ; x and y memory for input data must be paired --good luck..... ;***************************************************************************** ;********************************************************************* ; org x:$0 state1 dc $0ff,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 state2 dc $0ff,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 ; pathout ds numstates*(numinputs/8+1) indata ;this data encodes $1234,$5678,$9abc,$4973,$7925,$3491,$ad43,$ff21,$7ebb,$0100,$20 dc $a000,$a000,$a000,$6000,$6000,$a000,$a000,$6000 dc $6000,$6000,$6000,$6000,$6000,$a000,$a000,$6000 dc $a000,$6000,$a000,$6000,$a000,$6000,$a000,$6000 dc $a000,$a000,$6000,$6000,$6000,$a000,$a000,$a000 dc $a000,$a000,$a000,$a000,$a000,$a000,$a000,$a000 dc $a000,$6000,$6000,$a000,$a000,$a000,$a000,$a000 dc $a000,$a000,$a000,$a000,$a000,$6000,$6000,$a000 dc $6000,$a000,$6000,$a000,$6000,$6000,$6000,$6000 dc $a000,$a000,$6000,$6000,$a000,$a000,$a000,$6000 dc $a000,$6000,$a000,$6000,$6000,$a000,$6000,$a000 dc $a000,$a000,$6000,$a000,$a000,$a000,$a000,$6000 dc $6000,$6000,$6000,$a000,$6000,$6000,$6000,$6000 dc $6000,$6000,$a000,$a000,$a000,$a000,$6000,$6000 dc $a000,$a000,$6000,$a000,$a000,$a000,$a000,$a000 dc $a000,$6000,$6000,$a000,$a000,$a000,$a000,$a000 dc $6000,$a000,$6000,$a000,$6000,$6000,$a000,$a000 dc $6000,$6000,$6000,$a000,$a000,$6000,$a000,$6000 dc $6000,$6000,$a000,$a000,$a000,$6000,$a000,$a000 dc $a000,$a000,$6000,$6000,$6000,$a000,$a000,$6000 dc $6000,$a000,$6000,$a000,$6000,$a000,$a000,$a000 dc $a000,$a000,$6000,$6000,$a000,$6000,$a000,$6000 ;***************************************************************************** example a-1 basic 16-bit implementation of a viterbi decoder (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
a-10 viterbi decoder implementation basic algorithm program listing ;***************************************************************************** ; org y:$0 path1 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 path2 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 bry dsm numstates/2 decout ds numinputs/8 org y:@cvs(y,indata) ydata dc $a000,$a000,$a000,$6000,$a000,$6000,$a000,$6000 dc $a000,$6000,$a000,$a000,$6000,$6000,$a000,$6000 dc $a000,$a000,$6000,$a000,$6000,$6000,$6000,$a000 dc $6000,$6000,$6000,$6000,$6000,$6000,$a000,$6000 dc $6000,$6000,$6000,$a000,$a000,$a000,$6000,$a000 dc $a000,$a000,$6000,$6000,$6000,$a000,$6000,$a000 dc $6000,$6000,$6000,$6000,$a000,$6000,$a000,$a000 dc $6000,$6000,$a000,$6000,$6000,$6000,$a000,$6000 dc $a000,$6000,$6000,$a000,$a000,$6000,$a000,$a000 dc $a000,$a000,$a000,$6000,$a000,$a000,$6000,$6000 dc $6000,$a000,$6000,$a000,$a000,$6000,$a000,$6000 dc $6000,$6000,$a000,$a000,$6000,$a000,$6000,$a000 dc $a000,$6000,$6000,$a000,$a000,$6000,$a000,$a000 dc $a000,$a000,$a000,$a000,$a000,$6000,$a000,$6000 dc $a000,$6000,$a000,$6000,$6000,$6000,$6000,$6000 dc $a000,$a000,$a000,$a000,$a000,$6000,$6000,$a000 dc $a000,$a000,$a000,$6000,$a000,$a000,$6000,$a000 dc $6000,$6000,$a000,$6000,$6000,$6000,$a000,$a000 dc $6000,$6000,$a000,$a000,$6000,$a000,$a000,$6000 dc $a000,$6000,$6000,$6000,$6000,$a000,$a000,$a000 dc $a000,$a000,$6000,$a000,$6000,$6000,$6000,$6000 end example a-1 basic 16-bit implementation of a viterbi decoder (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
appendix b extended algorithm program listing f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
b-2 viterbi decoder implementation extended algorithm program listing b.1 16-bit enhanced viterbi decoder program listingb-3 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
extended algorithm program listing viterbi decoder implementation b-3 b.1 16-bit enhanced viterbi decoder program listing this appendix contains the complete 16-bit program listing for section 4 . the code in this section introduces several modifications to the code discussed in section 3 . generalized branch metrics are allowed, as well as the optimization of the code that reduces the computations when data occurs in blocks. example b-1 extended algorithm program listing opt mex ;****************56600**viterbi decoder*********************************** ; this routine implements a convolutional decoder using the viterbi alg. it is ; optimized for speed, which means that every alu register and all of the r registers ; are used. a significant amount of these can be freed by storing and reusing ; registers between the findmetrics routines and the acs,acsflush routines. ; input is at indata: real in x imag in y. output begins at y:decout ; global register use: a,b,x,y,r012345,n025,m245(are set to a modulo mode) ;*****************branch metric macro************************************* ; function: input data and generate branch metrics. this function ; subtracts the path metric for state 0 from all branch metrics ; to provide autonormaliztion. for this decoder, the metric is a scaled ; sum or difference of the real and imag inputs. ; inputs: ; r2 should point to the beginning of the branch metric table ; r5 should point to the latest path metric for state 0 ; r1 should point to the next input xy data pair ; outputs: ; branch metrics are stored at brx in xy memory ; registers used: ; a,b,x01,y01,r1,r2,r5, r5 unchanged,r2 unchanged (modulo req'd) ;*****************findmetrics macro****************************************** findmetrics macro move x:(r5),a ;st. metric scaling, read last st. 0 neg a l:(r1)+,y ;negate metric|grab dec input move #-16,x1 ;sign for real component, upper br. mac x1,y1,a a,b ;comp 0x partial br. path mt. to move y1,x0 ;mv real input to x0 mac x1,y0,a a,y1 ;a gets 00 br. y1 gets 0x partial br subl a,b b,x0 ;a has 00 br, b has 11 br, save path tfr y1,a a,y1 ;swap 0x and 00 to compute 01 branch mac -x1,y0,a b,x1 ;a gets 01 br tfr x0,b b,y0 ;swap path metric,11 branch to y0 subl a,b y1,x0 ;b gets 10 br, x0 has copy of 00 br. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
b-4 viterbi decoder implementation extended algorithm program listing ;***************************************************************************** ; at this point, x is configured with br 11|00, y with 00|11, and ab ; with 01|10, ba with 10|01 all needed for quick storage in xy memory ;***************************************************************************** move x,l:(r2)+ ;store 11 in location 0. move ab,l:(r2)+ ;store 01 in location 1 move ba,l:(r2)+ ;store 10 in location 2 move y,l:(r2)+ ;store 00 in location 3 move y,l:(r2)+ ;store 00 in location 4 move ba,l:(r2)+ ;store 10 in location 5 move ab,l:(r2)+ ;store 01 in location 6 move x,l:(r2)+ ;store 11 in location 7 move ba,l:(r2)+ ;store 10 in location 8 move y,l:(r2)+ ;store 00 in location 9 move x,l:(r2)+ ;store 11 in location 10 move ab,l:(r2)+ ;store 01 in location 11 move ab,l:(r2)+ ;store 01 in location 12 move x,l:(r2)+ ;store 11 in location 13 move y,l:(r2)+ ;store 00 in location 14 move ba,l:(r2)+ ;store 10 in location 15 r2-> brx endm example b-1 extended algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
extended algorithm program listing viterbi decoder implementation b-5 ;*******************viterbi add, compare, select butterfly macro*** ; function: update path metrics/paths for the viterbi algorithm by ; doing an add,compare,select update for state pairs. ; inputs: ; r2 should point to the beginning of the branch metric table ; r5 should point to the latest path metric for state 0 ; r4 should point to the storage location for updated state 0 ; n5 should offset addresses by numstates/2 ; outputs: ; updated path metrics/paths stored in xy memory ; registers used: ; a,b,y01,r2,r4,r5,n5 r2 unchanged (modulo req'd) ; registers: ; r5, pointer to the path metric/path table, arranged as ; x: path metric, y: path,states ordered assuming ; bits shift right to left. ; r4, pointer to the output path metric/path table ; r0, pointer to the branch metric table, arranged ; as x:c, y:d, cd,cd,cd, etc. ; ; sa------nsa ; \ c / ; d \ / ; \/ ; /\ ; d / \ ; / \ ; sb------nsb ; c ;***************************************************************** ; acs macro ; ; move #bry,r2 ;r2 points to branch metrics move l:(r2)+,y ;get first branch metric move l:(r5)+n5,a ;load 1st metric/path pair ; do #noacs,_p_nextstage ;update each state add y0,a l:(r5)-n5,b ;sum pt,br met,get next pair add y1,b ;update metric max a,b l:(r5)+n5,a ;pick max, reload 1st pair vsl b,0,l:(r4)+ ;save surivior, end top half add y1,a l:(r5)-n5,b ;sum pt,br,reload next pair add y0,b (r5)+ ;sum again,inc pt mt read ptr max a,b l:(r5)+n5,a ;pick max, load next pair move l:(r2)+,y ;load next branch metrics vsl b,1,l:(r4)+ ;write survivor,end 2nd half _p_nextstage nop ;needed to separate do loop ends endm example b-1 extended algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
b-6 viterbi decoder implementation extended algorithm program listing ;*******************************;preacs*************************************** ; function: update path metrics/paths for the viterbi algorithm by acs butterfly ; from assumed 0 state to start, and double the number of states on each invocation ; until the full trellis is used. this routine cannot be used unless the encoder is ; starting from an all 0's state. note this means it cannot be used if the data is ; a continuation of a previously processed data stream. use the acs macro instead. ; however, for packetised data, or other data that assumes the data starts with the ; encoder 0 filled, this routine saves cycles, and only state 0 needs to be ; initialised before starting the decode. ; inputs: ; r2 should point to the beginning of the branch metric table ; r5 should point to the latest path metric for state 0 ; r4 should point to the storage location for updated state 0 ; n5 should be the number of input states/2 to process ; n5 is doubled each time this macro is invoked ; outputs: ; updated path metrics/paths stored in xy memory ; registers used: ; a,b,y01,r2,r3,n3,r4,r5,n5 r2 unchanged (modulo req'd) ; registers: ; r5, pointer to the path metric/path table, arranged as ; x: path metric, y: path,states ordered assuming ; bits shift right to left. ; r4, pointer to the output path metric/path table ; r0, pointer to the branch metric table, arranged ; as x:c, y:d, cd,cd,cd, etc. ; sa------nsa ; \ c ; d \ ; \ ; \ ; \ ; \ ; sb nsb ;***************************************************************** preacs macro ; move #bry,r2 ;r2 points to branch metrics move r5,r3 ;save r5 to init r4 later move r4,n3 ;save r4 to init r5 later move l:(r2)+,ba ;get first branch metrics move l:(r5)+,y ;load 1st metric/path pair do n5,_p_nextstage update each state add y,a ;update metric add y,b l:(r5)+,y ;updt met, ld nxt pair vsl a,0,l:(r4)+ ;end top half vsl b,1,l:(r4)+ ;end 2nd half move l:(r2)+,ba ;ld br met _p_nextstage move n5,b ;recall loop count asl b n3,r5 ;mult it by 2 move r3,r4 move b,n5 ;storage for loop count move #bry,r2 ;reinit branch ptr endm example b-1 extended algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
extended algorithm program listing viterbi decoder implementation b-7 ;***********************acsflush macro**************************** ; function: this routine is very similar to the acs macro, except that ; the encoding shift register is now flushing back to 0. ; this means that only the upper paths are taken, halving ; the number of states we need to update. the acs code is ; modified so that we don't compute the lower paths. in ; addition, state storage is modified so that survivor ; paths are stored in consecutive memory locations. ; survivors are even states on the first pass, then ; every fourth state, etc. ; inputs: ; r2 should point to the beginning of the branch metric table ; r5 should point to the latest path metric for state 0 ; r4 should point to the storage location for updated state 0 ; n5 should be the number of input states/2 to process ; n5 is halved each time this macro is invoked ; outputs: ; updated path metrics/paths stored in xy memory ; registers used: ; a,b,y01,r2,n2,r3,n3,r4,r5,n5 r2 unchanged (modulo req'd) ; ; sa------nsa ; c / ; / ; / ; d/ ; / ; / ; sb ;***************************************************************** acsflush macro ;** move #bry,r2 ;r2 points to branch metrics move r5,r3 ;save r5 to init r4 later move r4,n3 ;save r4 to init r5 later move l:(r2)+n2,y ;get first branch metrics move l:(r5)+n5,a ;load 1st metric/path pair ; do n5,_nextstage ;update each state add y0,a l:(r5)-n5,b ;updt met,load next pair add y1,b l:(r2)+n2,y ;updt met,ld nxt br met max a,b (r5)+ ;sel surv,save met,inc st ptr move l:(r5)+n5,a ;ld next pair vsl b,0,l:(r4)+ ;end 2nd half _nextstage move n5,b ;recall flush count, loop count asr b n2,a ;divide it by 2,prep double branch jump move n3,r5 move b,n5 ;storage for flush count asl a (r2)-n2 ;reinit branch ptr move r3,r4 move a,n2 ;next pass, skip more branches endm example b-1 extended algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
b-8 viterbi decoder implementation extended algorithm program listing ;***********************store partial path metrics macro*************** ; function: the storage is somewhat twisted. the stored paths are ; current up to the most recent input bit, which is not ; convenient for traceback. as a result, i process the data as ; follows: i preloaded the path with 6 bits. thereafter, i ; process 8 path bits so the path has 14 bits. the most ; significant 8, i save. the remaining bits are the current ; encoder values for that path. the 5 lsb's are the current ; state. ; inputs: ; n0 should point to memory where the path bits are to be stored ; r5 should point to the latest path metric for state 0 ; outputs: ; updated paths storage x memory, n0 points to next path storage ; registers used: ; a,b,y1,r0,n0,r3,r5 ;********************************************************************* ; ; store off path data in bytes to avoid overflow in path reg's ; storepaths macro lpcnt ; move n0,r3 ;n0 stores path data pointer move n0,r0 move #>$1f,y1 ;mask for 5 lsb's(numencbits-1 of 1's) do lpcnt,_pstore1 ;store paths for each state move y:(r5),b ;grab path asr #5,b,a ;align bits 5-12 w/ a1 (8 bits beyond enc) and y1,b ;mask off 5 lsb's to return to path move a1,x:(r3)+ ;store 8 ms path bits move b1,y:(r5)+ ;return 5 ls path bits _pstore1 lua (r4+numstates),r5 ;r5 points to latest states lua (r0+numstates),n0 ;update path data pointer endm example b-1 extended algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
extended algorithm program listing viterbi decoder implementation b-9 ;***********************traceback output path macro******************* ; function: to output the correct data, we begin at the end. we take ; the output path of the survivor state (0), and place its ; associated output path in memory as the last output data ; byte. then we use bits 3-7 of that data as an offset pointer ; to the correct traceback data of the next previous path data ; memory. we continue this until we have traced the data back ; to the begining. ; inputs: ; n0 should point to memory where the path bits are to be stored ; r5 should point to the latest path metric for state 0 ; outputs: ; decoder output data in y memory ; registers used: ; a,b,y,r0,n0,r2,r5 ; ;********************************************************************* ; store off path data in bytes to avoid overflow in path reg's traceback macro ; move n0,r0 ;path ptr to r0 move #0,n0 ;prep for traceback move #decout+(numinputs/8/2),r2 ;point to end of output buffer move y:(r5),b ;recall last path move #-1,m2 ;r2 now linear move b1,x:(r0) ;save off last path data move #$513,y0 ;control word for extract ; ;*****************begin traceback***************************** if (even==0) move x:(r0+n0),a ;recall last path extractu y0,a,b ;bits 3-7 of a1 point to next data lua (r0-numstates),r0 ;dec r0 to next earlier path set lsl #8,a ;move to upper byte move b0,n0 ;load as offset for traceback move a1,y:(r2)- ;save off endif ; do #numinputs/8/2,trcbk ;once for each byte pair move x:(r0+n0),a ;recall last path extractu y0,a,b ;get ptr to next earlier path lua (r0-numstates),r0 ;point r0 to next earlier states move b0,n0 ;save ptr as offset move a1,y1 ;save out byte in x1 move x:(r0+n0),a ;do it all again! extractu y0,a,b lua (r0-numstates),r0 move b0,n0 lsl #8,a ;move to upper byte or y1,a ;or in last byte to get 16 bit word move a1,y:(r2)- ;store result trcbk endm example b-1 extended algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
b-10 viterbi decoder implementation extended algorithm program listing ;***************************main*********************************************** ; numstatesequ 32 encbits equ 6 ;most cases=log2(numstates)+1 noacs equ numstates/2 numinputsequ 168 even equ 1-(numinputs/8)%2 ;even set to 1/0 if numinputs is ; even/odd #bytes org p:$400 vitdec move #numstates/2-1,m2 ;r2 points to branch metric table move #brx,r2 move #state1,r5 ;r5 points to current state metric move #state2,r4 ;r4 points to updated state metric move #numstates*2-1,m4 ;both modulo to flip locations each sym move #numstates*2-1,m5 move #>1,n5 ;ctr/input metrics spacing for each butterfly ; move #numstates/2,n5 ;input metrics spacing for each butterfly move #pathout,n0 ;n0 points to storage for output paths move #-1,m3 ;set linear mode, traceback ptr move #>1,n2 ; move #indata,r1 ;r1 points to input data ; ;***********************preparation loop************************* ; this loop iterates by the number of bits used in the ; encoder to pre load the bit decisions. thereafter, ; the paths are updated and stored off in bytes. we need ; the preload bits so that the stored path metrics point ; correctly to their previous paths for traceback. ;**************************************************************** ; do #encbits-1,prelp ;preload decisions/trellis start ; findmetrics preacs ; acs prelp move #numstates/2,n5 ;n5 now serves as offset between fetch ; states ; ;*****************main loop--process bytes of data***************** do #numinputs/8-2,datalp;process bytes of output do #8,symlp ;8 bits per byte ; findmetrics acs symlp storepaths#numstates datalp example b-1 extended algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
extended algorithm program listing viterbi decoder implementation b-11 ;********postprocessing,last 2 info bytes & flush encoder back to 0**** ; ; these setups should accomodate encoders of 5 to 8 bits. ; encbits-1 preprocessed, so we have 16-encbits+1 bits left. ; we do encoder flushing for #encbits-1 bits. so we have ; (16-encbits+1) - encbits+1 do do before flushing. ; next, we process the encoder flush bits, stopping to store ; paths as needed. we store after 8 bits processed so this means ; store paths after 16-2*encbits+2 (from nonflush) +8-(16-2*encbits+2) ; (these are the additional flush bits needed before storing paths). ; finally, we finish out the data. the bits remaining are: ; 16-encbits+1-(16-2*encbits+2)-(8-(16-2*encbits+2)) ; = 8-encbits. ; ; for this example, we nonflush process ; 6 more bits, then flush 2, store path, then flush ; the last three to state 0. ; last 6 nonflush bits--- ;********************************************************************** do #16-(2*encbits)+2,last6 findmetrics acs ; last6 ; ; encoder flush----- ; move #>numstates/2,n5 ;flush, process 16,8, 4, then 2, ; then 1 state do #8-(16-(2*encbits)+2),flsh;process 3 of last five bits to get ; next to last byte findmetrics acsflush ; acs flsh storepaths#numstates/8 ; ; flush the last three bits------ ; do #8-encbits+1,flsh3 ; do last 3 bits to get last byte ; findmetrics acsflush ; acs flsh3 ; traceback ; finish nop example b-1 extended algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
b-12 viterbi decoder implementation extended algorithm program listing ;**********************memory organization**************************** ; most of the memory location is important--the first two ; labels of the x and y data memory are paired and must ; be co located (at the same addresses). in addition, state1 ; and state2 must be located on a 0 mod 2*numstates boundary, ; x and y memory for input data must be paired --good luck..... ;********************************************************************* ; org x:$0 state1 dc $0ff,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 state2 dc $0ff,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 brx dsm numstates/2 ; pathout ds numstates*(numinputs/8+1) indata ;this data encodes $1234,$5678,$9abc,$4973,$7925,$3491,$ad43,$ff21,$7ebb,$0100,$20 dc $a000,$a000,$a000,$6000,$6000,$a000,$a000,$6000 dc $6000,$6000,$6000,$6000,$6000,$a000,$a000,$6000 dc $a000,$6000,$a000,$6000,$a000,$6000,$a000,$6000 dc $a000,$a000,$6000,$6000,$6000,$a000,$a000,$a000 dc $a000,$a000,$a000,$a000,$a000,$a000,$a000,$a000 dc $a000,$6000,$6000,$a000,$a000,$a000,$a000,$a000 dc $a000,$a000,$a000,$a000,$a000,$6000,$6000,$a000 dc $6000,$a000,$6000,$a000,$6000,$6000,$6000,$6000 dc $a000,$a000,$6000,$6000,$a000,$a000,$a000,$6000 dc $a000,$6000,$a000,$6000,$6000,$a000,$6000,$a000 dc $a000,$a000,$6000,$a000,$a000,$a000,$a000,$6000 dc $6000,$6000,$6000,$a000,$6000,$6000,$6000,$6000 dc $6000,$6000,$a000,$a000,$a000,$a000,$6000,$6000 dc $a000,$a000,$6000,$a000,$a000,$a000,$a000,$a000 dc $a000,$6000,$6000,$a000,$a000,$a000,$a000,$a000 dc $6000,$a000,$6000,$a000,$6000,$6000,$a000,$a000 dc $6000,$6000,$6000,$a000,$a000,$6000,$a000,$6000 dc $6000,$6000,$a000,$a000,$a000,$6000,$a000,$a000 dc $a000,$a000,$6000,$6000,$6000,$a000,$a000,$6000 dc $6000,$a000,$6000,$a000,$6000,$a000,$a000,$a000 dc $a000,$a000,$6000,$6000,$a000,$6000,$a000,$6000 ; org y:$0 path1 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 path2 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 example b-1 extended algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
extended algorithm program listing viterbi decoder implementation b-13 bry dsm numstates/2 decout ds numinputs/8 org y:@cvs(y,indata) ydata dc $a000,$a000,$a000,$6000,$a000,$6000,$a000,$6000 dc $a000,$6000,$a000,$a000,$6000,$6000,$a000,$6000 dc $a000,$a000,$6000,$a000,$6000,$6000,$6000,$a000 dc $6000,$6000,$6000,$6000,$6000,$6000,$a000,$6000 dc $6000,$6000,$6000,$a000,$a000,$a000,$6000,$a000 dc $a000,$a000,$6000,$6000,$6000,$a000,$6000,$a000 dc $6000,$6000,$6000,$6000,$a000,$6000,$a000,$a000 dc $6000,$6000,$a000,$6000,$6000,$6000,$a000,$6000 dc $a000,$6000,$6000,$a000,$a000,$6000,$a000,$a000 dc $a000,$a000,$a000,$6000,$a000,$a000,$6000,$6000 dc $6000,$a000,$6000,$a000,$a000,$6000,$a000,$6000 dc $6000,$6000,$a000,$a000,$6000,$a000,$6000,$a000 dc $a000,$6000,$6000,$a000,$a000,$6000,$a000,$a000 dc $a000,$a000,$a000,$a000,$a000,$6000,$a000,$6000 dc $a000,$6000,$a000,$6000,$6000,$6000,$6000,$6000 dc $a000,$a000,$a000,$a000,$a000,$6000,$6000,$a000 dc $a000,$a000,$a000,$6000,$a000,$a000,$6000,$a000 dc $6000,$6000,$a000,$6000,$6000,$6000,$a000,$a000 dc $6000,$6000,$a000,$a000,$6000,$a000,$a000,$6000 dc $a000,$6000,$6000,$6000,$6000,$a000,$a000,$a000 dc $a000,$a000,$6000,$a000,$6000,$6000,$6000,$6000 end example b-1 extended algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
b-14 viterbi decoder implementation extended algorithm program listing f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
appendix c 24-bit algorithm program listing f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
c-2 viterbi decoder implementation 24-bit algorithm program listing c.1 24-bit enhanced viterbi decoder program listingc-3 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
24-bit algorithm program listing viterbi decoder implementation c-3 c.1 24-bit enhanced viterbi decoder program listing this section contains a second complete program listing for section 4 . this program is nearly identical to the program listing in appendix b . the difference is that the data is 24-bit data, meant to run on a 56300 family digital signal processor in 24-bit arithmetic mode. in addition, the path data is stored and recovered in 16-bit words instead of 8-bit bytes. the longer register allow us to reduce the memory and some complexity in the traceback. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
c-4 viterbi decoder implementation 24-bit algorithm program listing example c-1 24-bit algorithm program listing opt mex ;****************56600**24 bit viterbi decoder***************************** ; this routine implements a convolutional decoder using the viterbi alg. ; it is optimized for speed, which means that every alu register ; and all of the r registers are used. a significant amount of ; these can be freed by storing and reusing registers between the ; findmetrics routines and the acs,acsflush routines. ; input is at indata: real in x imag in y. output begins at y:decout ; global register use: a,b,x,y,r012345,n025,m245(are set to a modulo mode) ;************************************************************************* ; ;*****************branch metric macro************************************* ; function: input data and generate branch metrics. this function ; subtracts the path metric for state 0 from all branch metrics ; to provide autonormaliztion. for this decoder, the metric is a scaled ; sum or difference of the real and imag inputs. ; inputs: ; r2 should point to the beginning of the branch metric table ; r5 should point to the latest path metric for state 0 ; r1 should point to the next input xy data pair ; outputs: ; branch metrics are stored at brx in xy memory ; registers used: ; a,b,x01,y01,r1,r2,r5, r5 unchanged,r2 unchanged (modulo req'd) ;**************************************************************** ; findmetricsmacro move x:(r5),a ;st. metric scaling, read last st. 0 neg a l:(r1)+,y ;negate metric|grab dec input move #-16,x1 ;sign for real component, upper br. mac x1,y1,a a,b move y1,x0 ;cp metric to b|mv real in to x0 mac x1,y0,a a,y1 ;y1 gets 0x partial branch subl a,b b,x0 ;a has 00 br., b has 11 branch, save st. tfr y1,a a,y1 ;swap 0x and 00 to compute 01 branch mac -x1,y0,a b,x1 ;a gets 01 br,x1 gets cp of 11 br. tfr x0,b b,y0 ;swap st metric,11 branch to y0 subl a,b y1,x0 ;b gets 10 br, x0 has copy of 00 br. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
24-bit algorithm program listing viterbi decoder implementation c-5 ;***************************************************************************** ; at this point, x is configured with br 11|00, y with 00|11, and ; ab with 01|10, ba with 10|01 all needed for quick storage in xy memory ;***************************************************************************** move x,l:(r2)+ ;store 11 in location 0. move ab,l:(r2)+ ;store 01 in location 1 move ba,l:(r2)+ ;store 10 in location 2 move y,l:(r2)+ ;store 00 in location 3 move y,l:(r2)+ ;store 00 in location 4 move ba,l:(r2)+ ;store 10 in location 5 move ab,l:(r2)+ ;store 01 in location 6 move x,l:(r2)+ ;store 11 in location 7 move ba,l:(r2)+ ;store 10 in location 8 move y,l:(r2)+ ;store 00 in location 9 move x,l:(r2)+ ;store 11 in location 10 move ab,l:(r2)+ ;store 01 in location 11 move ab,l:(r2)+ ;store 01 in location 12 move x,l:(r2)+ ;store 11 in location 13 move y,l:(r2)+ ;store 00 in location 14 move ba,l:(r2)+ ;store 10 in location 15 r2-> brx endm example c-1 24-bit algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
c-6 viterbi decoder implementation 24-bit algorithm program listing ;*******************viterbi add, compare, select butterfly macro*** ; function: update path metrics/paths for the viterbi algorithm by ; doing an add,compare,select update for state pairs. ; inputs: ; r2 should point to the beginning of the branch metric table ; r5 should point to the latest path metric for state 0 ; r4 should point to the storage location for updated state 0 ; n5 should offset addresses by numstates/2 ; outputs: ; updated path metrics/paths stored in xy memory ; registers used: ; a,b,y01,r2,r4,r5,n5 r2 unchanged (modulo req'd) ; registers: ; r5, pointer to the path metric/path table, arranged as ; x: path metric, y: path,states ordered assuming ; bits shift right to left. ; r4, pointer to the output path metric/path table ; r0, pointer to the branch metric table, arranged ; as x:c, y:d, cd,cd,cd, etc. ; ; sa------nsa ; \ c / ; d \ / ; \/ ; /\ ; d / \ ; / \ ; sb------nsb ; c ;***************************************************************** ; acs macro ; ; move #bry,r2 ;r2 points to branch metrics move l:(r2)+,y ;get first branch metric move l:(r5)+n5,a ;load 1st metric/path pair ; do #noacs,_p_nextstage ;update each state add y0,a l:(r5)-n5,b ;get next pair add y1,b ;update metrics max a,b l:(r5)+n5,a ;reload 1st pair vsl b,0,l:(r4)+ ;end top half add y1,a l:(r5)-n5,b ;reload next pair add y0,b (r5)+ ;inc st ptr max a,b l:(r5)+n5,a ;load next pair move l:(r2)+,y ;load next branch metrics vsl b,1,l:(r4)+ ;end 2nd half _p_nextstage nop ;needed to separate do loop ends endm example c-1 24-bit algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
24-bit algorithm program listing viterbi decoder implementation c-7 ;*******************************;preacs*************************************** ;function: update path metrics/paths for the viterbi algorithm by the acs ; butterfly from assumed 0 state to start, and double the number of states ; on each invocation until the full trellis is used. this routine cannot ; be used unless the encoder is starting from an all 0's state. note this ; means it cannot be used if the data is a continuation of a previously ; processed data stream. use the acs macro instead. however, for packetised ; data, or other data that assumes the data starts with the encoder 0 filled, ; this routine saves cycles, and only state 0 needs to be initialised ; before starting the decode. ; inputs: ; r2 should point to the beginning of the branch metric table ; r5 should point to the latest path metric for state 0 ; r4 should point to the storage location for updated state 0 ; n5 should be the number of input states/2 to process ; n5 is doubled each time this macro is invoked ; outputs: ; updated path metrics/paths stored in xy memory ; registers used: ; a,b,y01,r2,r3,n3,r4,r5,n5 r2 unchanged (modulo req'd) ; registers: ; r5, pointer to the path metric/path table, arranged as x: path metric, ; y: path,states ordered assuming bits shift right to left. ; r4, pointer to the output path metric/path table ; r0, pointer to the branch metric table, arranged as x:c, y:d, cd,cd, etc. ; ; sa------nsa ; \ c ; d \ ; \ ; \ ; \ ; \ ; sb nsb ;***************************************************************** preacs macro ; move #bry,r2 ;r2 points to branch metrics move r5,r3 ;save r5 to init r4 later move r4,n3 ;save r4 to init r5 later move l:(r2)+,ba ;get first branch metrics move l:(r5)+,y ;load 1st metric/path pair do n5,_p_nextstage ;update each state add y,a ;update metric add y,b l:(r5)+,y ;updt met, ld nxt pair vsl a,0,l:(r4)+ ;end top half vsl b,1,l:(r4)+ ;end 2nd half move l:(r2)+,ba ;ld br met _p_nextstage move n5,b ;recall loop count asl b n3,r5 ;mult it by 2 move r3,r4 move b,n5 ;storage for loop count move #bry,r2 ;reinit branch ptr endm example c-1 24-bit algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
c-8 viterbi decoder implementation 24-bit algorithm program listing ;***********************acsflush macro**************************** ; function: this routine is very similar to the acs macro, except that ; the encoding shift register is now flushing back to 0. ; this means that only the upper paths are taken, halving ; the number of states we need to update. the acs code is ; modified so that we don't compute the lower paths. in ; addition, state storage is modified so that survivor ; paths are stored in consecutive memory locations. ; survivors are even states on the first pass, then ; every fourth state, etc. ; inputs: ; r2 should point to the beginning of the branch metric table ; r5 should point to the latest path metric for state 0 ; r4 should point to the storage location for updated state 0 ; n5 should be the number of input states/2 to process ; n5 is halved each time this macro is invoked ; outputs: ; updated path metrics/paths stored in xy memory ; registers used: ; a,b,y01,r2,n2,r3,n3,r4,r5,n5 r2 unchanged (modulo req'd) ; ; sa------nsa ; c / ; / ; / ; d/ ; / ; / ; sb ; ;***************************************************************** acsflush macro ;** move #bry,r2 ;r2 points to branch metrics move r5,r3 ;save r5 to init r4 later move r4,n3 ;save r4 to init r5 later move l:(r2)+n2,y ;get first branch metrics move l:(r5)+n5,a ;load 1st metric/path pair ; do n5,_nextstage ;update each state add y0,a l:(r5)-n5,b ;updt met,load next pair add y1,b l:(r2)+n2,y ;updt met,ld nxt br met max a,b (r5)+ ;sel surv,save met,inc st ptr move l:(r5)+n5,a ;ld next pair vsl b,0,l:(r4)+ ;end 2nd half _nextstage move n5,b ;recall flush count, loop count asr b n2,a ;divide it by 2,prep double branch jump move n3,r5 move b,n5 ;storage for flush count asl a (r2)-n2 ;reinit branch ptr move r3,r4 move a,n2 ;next pass, skip more branches endm example c-1 24-bit algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
24-bit algorithm program listing viterbi decoder implementation c-9 ;***********************store partial path metrics macro*************** ; function: the storage is somewhat twisted. the stored paths are current ; up to the most recent input bit, which is not convenient for ; traceback. as a result, i process the data as follows: i pre ; loaded the path with 6 bits. thereafter, i process 8 path bits ; so the path has 14 bits. the most significant 8, i save. ; the remaining bits are the current encoder values for that path. ; the 5 lsb's are the current state. ; inputs: ; n0 should point to memory where the path bits are to be stored ; r5 should point to the latest path metric for state 0 ; outputs: ; updated paths storage x memory, n0 points to next path storage ; registers used: ; a,b,y1,r0,n0,r3,r5 ;********************************************************************* ; ; store off path data in wordsto avoid overflow in path reg's ; storepaths macro lpcnt ; move n0,r3 ;n0 stores path data pointer move n0,r0 move #>$1f,y1 ;mask for 1 lsb's(numencbits-1 of 1's) do lpcnt,_pstore1 ;store paths for each state move y:(r5),b ;grab path asr #5,b,a ;align bits5-20 w/ a1 (16 bits beyond enc) and y1,b ;mask off 5 lsb's to return to path move a1,x:(r3)+ ;store 16 ms path bits move b1,y:(r5)+ ;return 5 ls path bits _pstore1 lua (r4+numstates),r5 ;r5 points to latest states lua (r0+numstates),n0 ;update path data pointer endm example c-1 24-bit algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
c-10 viterbi decoder implementation 24-bit algorithm program listing ;***********************traceback output path macro******************* ; function: to output the correct data, we begin at the end. we take the ; output path of the survivor state (0), and place its associated ; output path in memory as the last output data word. then we use ; bits 11-15 of that data as an offset pointer to the correct traceback ; data of the next previous path data memory. we continue this until ; we have traced the data back to the begining. ; inputs: ; n0 should point to memory where the path bits are to be stored ; r5 should point to the latest path metric for state 0 ; outputs: ; decoder output data in y memory ; registers used: ; a,b,y,r0,n0,r2,r5 ; ;********************************************************************* ; ; store off path data in words to avoid overflow in path reg's ; traceback macro ; move n0,r0 ;path ptr to r0 move #0,n0 ;prep for traceback move #decout+(numinputs/8/2),r2 ;point to end of output buffer move y:(r5),b ;recall last path move #-1,m2 ;r2 now linear move b1,x:(r0) ;save off last path data move #$5023,y0 ;control word for extract ; ;*****************begin traceback***************************** ; if (even==0) move x:(r0+n0),a ;recall last path extractu y0,a,b ;bits 11-15 of a1 point to next data lua (r0-numstates),r0 ;dec r0 to next earlier state set lsl #8,a ;move to upper byte move b0,n0 ;load as offset for traceback move a1,y:(r2)- ;save off endif ; do #numinputs/8/2,trcbk ;once for each byte pair move x:(r0+n0),a ;recall last path extractu y0,a,b ;get ptr to next earlier path lua (r0-numstates),r0 ;point r0 to next earlier states move b0,n0 ;save ptr as offset move a1,y:(r2)- ;store result trcbk endm example c-1 24-bit algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
24-bit algorithm program listing viterbi decoder implementation c-11 ;***************************main*********************************************** ; numstatesequ 32 encbits equ 6 ;most cases=log2(numstates)+1 noacs equ numstates/2 numinputsequ 168 even equ 1-(numinputs/8)%2 ;even set to 1/0 if numinputs is ; even/odd #bytes org p:$400 vitdec move #numstates/2-1,m2 ;r2 points to branch metric table move #brx,r2 move #state1,r5 ;r5 points to current state metric move #state2,r4 ;r4 points to updated state metric move #numstates*2-1,m4 ;modulo to flip locations each sym move #numstates*2-1,m5 move #>1,n5 ;ctr/input metrics spacing for each ; butterfly ; move #numstates/2,n5 ;input spacing for each butterfly move #pathout,n0 ;n0 points to stor for output paths move #-1,m3 ;set linear mode, traceback ptr move #>1,n2 ; move #indata,r1 ;r1 points to input data ; ;***********************preparation loop************************* ; this loop iterates by the number of bits used in the ; encoder to pre load the bit decisions. thereafter, ; the paths are updated and stored off in bytes. we need ; the preload bits so that the stored path metrics point ; correctly to their previous paths for traceback. ;**************************************************************** ; do #encbits-1,prelp ;preload decisions/trellis start ; findmetrics preacs ; acs prelp move #numstates/2,n5 ;n5 now serves as offset between ; fetch states ;*****************main loop--process bytes of data***************** do #numinputs/16-1,datalp ;process bytes of output do #16,symlp ;16 bits per byte ; findmetrics acs symlp storepaths#numstates datalp example c-1 24-bit algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
c-12 viterbi decoder implementation 24-bit algorithm program listing ;********postprocessing, flush encoder back to 0**** ; assumes that there are an odd number of bytes of data!!! ; this will need redoing if the number of data bytes is even. ; these setups should accomodate encoders of 5 to 8 bits. ; encbits-1 preprocessed, 152 in main loop done ; so we have 24-encbits+1 bits left. ; we do encoder flushing for #encbits-1 bits. so we have ; (24-encbits+1) - encbits+1 do do before flushing. ; next, we process the encoder flush bits, stopping to store ; paths as needed. we store after 16 bits processed so this means ; store paths after 24-2*encbits+2 (from nonflush) +16-(24-2*encbits+2) ; (these are the additional flush bits needed before storing paths). ; finally, we finish out the data. the bits remaining are: ; 24-encbits+1-(24-2*encbits+2)-(16-(24-2*encbits+2)) ; = 8-encbits+1. ; ; for this example, we nonflush process ; 14 more bits, then flush 2, store path, then flush ; the last three to state 0. ; last 14 nonflush bits--- ;********************************************************************** do #24-(2*encbits)+2,last14 findmetrics acs ; last14 ; ; encoder flush----- ; move #>numstates/2,n5 ;flush, process 16,8, 4, then 2, ; then 1 state do #16-(24-(2*encbits)+2),flsh;process 2 of last five bits to get ; next to last byte findmetrics acsflush ; acs flsh storepaths#numstates/8 ; ; flush the last three bits------ ; do #8-encbits+1,flsh3 ;process last 3 bits to get last ; byte findmetrics acsflush ; acs flsh3 ; traceback ; finish nop example c-1 24-bit algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
24-bit algorithm program listing viterbi decoder implementation c-13 ;**********************memory organization**************************** ; most of the memory location is important--the first two ; labels of the x and y data memory are paired and must ; be co located (at the same addresses). in addition, state1 ; and state2 must be located on a 0 mod 2*numstates boundary, ; x and y memory for input data must be paired --good luck..... ;********************************************************************* ; org x:$0 state1 dc $0ff,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 state2 dc $0ff,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 brx dsm numstates/2 ; pathout ds numstates*(numinputs/8+1) indata ;this data encodes $1234,$5678,$9abc,$4973,$7925,$3491,$ad43,$ff21,$7ebb,$0100,$20 dc $a00000,$a00000,$a00000,$600000,$600000,$a00000,$a00000,$600000 dc $600000,$600000,$600000,$600000,$600000,$a00000,$a00000,$600000 dc $a00000,$600000,$a00000,$600000,$a00000,$600000,$a00000,$600000 dc $a00000,$a00000,$600000,$600000,$600000,$a00000,$a00000,$a00000 dc $a00000,$a00000,$a00000,$a00000,$a00000,$a00000,$a00000,$a00000 dc $a00000,$600000,$600000,$a00000,$a00000,$a00000,$a00000,$a00000 dc $a00000,$a00000,$a00000,$a00000,$a00000,$600000,$600000,$a00000 dc $600000,$a00000,$600000,$a00000,$600000,$600000,$600000,$600000 dc $a00000,$a00000,$600000,$600000,$a00000,$a00000,$a00000,$600000 dc $a00000,$600000,$a00000,$600000,$600000,$a00000,$600000,$a00000 dc $a00000,$a00000,$600000,$a00000,$a00000,$a00000,$a00000,$600000 dc $600000,$600000,$600000,$a00000,$600000,$600000,$600000,$600000 dc $600000,$600000,$a00000,$a00000,$a00000,$a00000,$600000,$600000 dc $a00000,$a00000,$600000,$a00000,$a00000,$a00000,$a00000,$a00000 dc $a00000,$600000,$600000,$a00000,$a00000,$a00000,$a00000,$a00000 dc $600000,$a00000,$600000,$a00000,$600000,$600000,$a00000,$a00000 dc $600000,$600000,$600000,$a00000,$a00000,$600000,$a00000,$600000 dc $600000,$600000,$a00000,$a00000,$a00000,$600000,$a00000,$a00000 dc $a00000,$a00000,$600000,$600000,$600000,$a00000,$a00000,$600000 dc $600000,$a00000,$600000,$a00000,$600000,$a00000,$a00000,$a00000 dc $a00000,$a00000,$600000,$600000,$a00000,$600000,$a00000,$600000 ; example c-1 24-bit algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
org y:$0 path1 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 path2 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 dc $0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0 bry dsm numstates/2 decout ds numinputs/8 org y:@cvs(y,indata) ydata dc $a00000,$a00000,$a00000,$600000,$a00000,$600000,$a00000,$600000 dc $a00000,$600000,$a00000,$a00000,$600000,$600000,$a00000,$600000 dc $a00000,$a00000,$600000,$a00000,$600000,$600000,$600000,$a00000 dc $600000,$600000,$600000,$600000,$600000,$600000,$a00000,$600000 dc $600000,$600000,$600000,$a00000,$a00000,$a00000,$600000,$a00000 dc $a00000,$a00000,$600000,$600000,$600000,$a00000,$600000,$a00000 dc $600000,$600000,$600000,$600000,$a00000,$600000,$a00000,$a00000 dc $600000,$600000,$a00000,$600000,$600000,$600000,$a00000,$600000 dc $a00000,$600000,$600000,$a00000,$a00000,$600000,$a00000,$a00000 dc $a00000,$a00000,$a00000,$600000,$a00000,$a00000,$600000,$600000 dc $600000,$a00000,$600000,$a00000,$a00000,$600000,$a00000,$600000 dc $600000,$600000,$a00000,$a00000,$600000,$a00000,$600000,$a00000 dc $a00000,$600000,$600000,$a00000,$a00000,$600000,$a00000,$a00000 dc $a00000,$a00000,$a00000,$a00000,$a00000,$600000,$a00000,$600000 dc $a00000,$600000,$a00000,$600000,$600000,$600000,$600000,$600000 dc $a00000,$a00000,$a00000,$a00000,$a00000,$600000,$600000,$a00000 dc $a00000,$a00000,$a00000,$600000,$a00000,$a00000,$600000,$a00000 dc $600000,$600000,$a00000,$600000,$600000,$600000,$a00000,$a00000 dc $600000,$600000,$a00000,$a00000,$600000,$a00000,$a00000,$600000 dc $a00000,$600000,$600000,$600000,$600000,$a00000,$a00000,$a00000 dc $a00000,$a00000,$600000,$a00000,$600000,$600000,$600000,$600000 end example c-1 24-bit algorithm program listing (continued) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .

▲Up To Search▲

Price & Availability of DSP56600

	To Download DSP56600 Datasheet File
If you can't view the Datasheet, Please click here to try to view without PDF Reader .